Archives Wales Catalogues Online: Working with the Archives Hub

Stacy Capner reflects on her first six months as Project Officer for the Archives Wales Catalogues Online project, a collaboration between the Archives and Records Council Wales and the Archives Hub to increase the discoverability of Welsh archives.

For a few years now there has been a strategic goal to get Wales’ archive collections more prominently ‘out there’ using the Archives Wales website. Collection level descriptions have been made available previously through the ‘Archives Network Wales’ project, but the aim now is to create a single portal to search and access multi-level descriptions from across services. The Archives Hub has an established, standards based way of doing this, so instead of re-inventing the wheel, Archives and Records Council Wales (ARCW) saw an opportunity to work with them to achieve these aims.

The work to take data from Welsh Archives into the Archives Hub started some time ago, but it became clear that getting exports from different systems and working with different cataloguing practices required more dedicated 1-2-1 liaison. I am the project officer on a defined project which began in April to provide dedicated support to archive services across Wales and to establish requirements for uploading their catalogue data to the Archives Hub (and subsequently to Archives Wales).

This project is supported by the Welsh Government through its Museums Archives and Libraries Division, with a grant to Swansea University, a member of ARCW and a long-standing contributor to the Hub. I’m on secondment from the University to the project, which means I’ve found myself back in my northern neck of the woods working alongside the Archives Hub team. This project has come at a time when the Archives Hub have been putting a lot of thought into their processes for uploading data straight from systems, which means that the requirements for Welsh services have started to define an approach which could be applied to archive services across Scotland, England and Northern Ireland.

Here are my reflections on the project so far:

  1. Wales has fantastic collections, holding internationally significant material. They deserve to be promoted, accessible and searchable to as wide an audience as possible. Some examples-

National Library of Wales, The Survey of the Manors of Crickhowell & Tretower (inscribed in the UNESCO Memory of the World Register, 2016) https://www.llgc.org.uk/blog/?p=11715

Swansea University, South Wales Coalfield Collection http://www.swansea.ac.uk/library/archive-and-research-collections/richard-burton-archives/ourcollections/southwalescoalfieldcollection/

West Glamorgan, Neath Abbey Ironworks collection (inscribed in the UNESCO Memory of the World register, 2014) http://www.southwales-eveningpost.co.uk/treasured-neath-port-talbot-history-recognised/story-26073633-detail/story.html

Bangor University, Penrhyn Estate papers (including material relating to the sugar plantations in Jamaica) https://www.bangor.ac.uk/archives/sugar_slate.php.en#project

Photograph of Ammanford colliers and workmen standing in front of anthracite truck, c 1900.
Photograph of Ammanford colliers and workmen standing in front of anthracite truck, c 1900. From the South Wales Coalfield Collection. Source: Richard Burton Archives, Swansea University (Ref: SWCC/PHO/COL/11)
  1. Don’t be scared of EAD ! I was. My knowledge of EAD (Encoded Archival Description) hadn’t been refreshed in 10 years, since Jane Stevenson got us to create brownie recipes using EAD tags on the archives course. So, whilst I started the task with confidence in cataloguing and cataloguing systems, my first month or so was spent learning about the Archives Hub EAD requirements. For contributors, one of the benefits of the Archives Hub is that they’ve created guidance, tools and processes so that archivists don’t have to become experts at creating or understanding EAD (though it is useful and interesting, if you get the chance!).
  1. The Archives Hub team are great! Their contributor numbers are growing (over 300 now) and their new website and editor are only going to make it easier for archive services to contribute and for researchers to search. What has struck me is that the team are all hot on data, standards and consistency, but it’s combined with a willingness to find solutions/processes which won’t put too much extra pressure on archive services wishing to contribute. It’s a balance that seems to work well and will be crucial for this project.
  1. The information gathering stage was interesting. And tiring. I visited every ARCW member archive service in Wales to introduce them to the project, find out what cataloguing systems they were using, and to review existing electronic catalogues. Most services in Wales are using Calm, though other systems currently being used include internally created databases, AtoM, Archivists Toolkit and Modes. It was really helpful to see how fields were being used, how services had adapted systems to suit them, and how all of this fitted in to Archives Hub requirements for interoperability.

    Photo of icecream
    Perks of working visits to beautiful parts of Wales.
  1. The support stage is set to be more interesting. And probably more tiring! The next 6 months will be spent providing practical support to services to help enable their catalogues to meet Archives Hub requirements. I’ll be able to address most of the smaller, service specific, tasks on site visits. The Hub team and I have identified a number of trickier ‘issues’ which we’ll hash out with further meetings and feedback from services. I can foresee further blog posts on these so briefly they are:
  • Multilingualism- most services catalogue Welsh items/collections in Welsh, English items/collections in English and multi-language item/collections bilingually. However, the method of doing this across services (and within services) isn’t consistent. We’re going to look at what can be done to ensure that descriptions in multiple languages are both human and machine readable.
  • Ref no/Alt ref- due to legacy issues with non-hierarchical catalogues, or just services personal preference, there are variations in the use of these fields. Some services use the ref no as the reference, others use the alt ref no as the reference. This isn’t a problem (as long as it’s consistent). Some services use ref no as the reference but not at series level, others use the alt ref no as the reference but not at series level. This will prove a little trickier for the Archives Hub to handle but hopefully workarounds for individual services will be found.
  • Extent fields missing- this is a mandatory field at collection level for the Archives Hub. It’s important to give researchers an idea of the size of the collection/series (it’s also an ISAD(G) required field). However, many services have hundreds of collection level descriptions which are missing extent. It’s not something I’ll practically be able to address on my support visits so the possibility of further work/funding will be looked into.
  • Indexing- this is understandably very important to the Archives Hub (they explain why here). For several archive services in Wales it seems to have been a step too far in the cataloguing process, mainly due to a lack of resource/time/training. Most have used imported terms from an old database or nothing at all. Although this will not prevent services from contributing catalogues to the Archives Hub, it does open up opportunities to think about partnership projects which might address this in the future (including looking at Welsh language index terms).

The project has made me think about how I’ve catalogued in the past. It’s made me much more aware that catalogues shouldn’t just be an inward-facing, local or an intellectual control based task; we should be constantly aware of making our descriptions more discoverable to researchers. And it’s shown me the importance of standards and consistency in achieving this (I feel like I’ve referenced consistency a lot in this one blog post; consistency is important!).  I hope that the project is also prompting Welsh archive services to reflect on the accessibility of their own cataloguing- something which might not have been looked at in many years.

There’s a lot of work to be done, both in this foundation work and further funding/projects which might come of the back of it. But hopefully in the next few years you’ll be discovering much more of Wales’ archive collections online.

Stacy Capner
Project Officer
Archives Wales Catalogues Online

Related:

Archives Hub EAD Editor – http://archiveshub.ac.uk/eadeditor/

Archives Hub contributors – list and map

 

Excel template

Update May 2015: Please Note we need to make some changes to the Excel template and we are not currently working with Excel data. We hope to be able to offer this service in the future.

As part of Project Headway we wanted to create an Excel template which archives could use to catalogue and create EAD. We know that some archives – especially smaller and under-resourced archives – are using spreadsheets or word processing software to catalogue, and often lack the time or resources to switch to using an archival management system. While users can catalogue directly on to the EAD Editor, this isn’t a perfect solution –  it won’t work in some older browsers, or offline.

While we would have liked to offer a script that allowed users to convert their own Excel catalogues to EAD, it soon became apparent that this wasn’t an option. We would have needed to produce a script for each institution, and relied on the institution using Excel in a very consistent, systematic way – and a way that was ISAD(G) compliant, and could easily be mapped to EAD. So we decided to start off with a simple template, which we can adapt to individual user needs if required.

I’d never worked with XML in Excel before, and a lot of the process was simply trial-and-error, googling error messages, and sending forlorn messages to my programmer husband asking ‘what on earth is denormalised data and how do I stop it?’. I found the office.microsoft.com and msdn.microsoft.com sites useful for figuring out the basics of getting XML in and out of Excel – though I often turned to support elsewhere, too (eg Microsoft support will only tell you that denormalised data is not supported – not what it is or how to fix it).

To get started with using XML in Excel, you need to have the XML add-in installed (it says 2003, but will work with other versions) and then make sure you can see the ‘developer’ tab – if you can’t, it’s under options -> customize ribbon.

While it’s hard (in retrospect) to remember all of the stages I went through in the trial-and-error,  I know I started by trying to create an XSD (XML schema file) from in-Excel data entry. It failed. I tried importing the EAD.xsd – which just failed, silently (no error messages- no messages at all).

I was also concerned that the official EAD.xsd was too complicated for my (and our users’) needs – for instance, this project didn’t require lists of enumeration values. I needed something a bit simpler – and I’d already figured out that Excel couldn’t handle multi-level descriptions – so I needed to start with something collection-level only, too.

I created a basic EAD collection-level description in the Archives Hub EAD Editor, saved it as XML, removed the DTD declaration (not allowed in Excel), and imported it (using developer -> xml -> import).  Clicking on ‘source’ in the developer XML tab then shows you the XML fields.

XML map in Excel

You can then export this map as an XSD, creating your XML schema.  Of course, it wasn’t that easy. This is where denormalised data cropped up – and stopped me from exporting. I have to admit, I’m still not entirely sure what exactly denormalised data is – and given definitions such as:

A denormalised data model is not the same as a data model that has not been normalised, and denormalisation should only take place after a satisfactory level of normalisation has taken place and that any required constraints and/or rules have been created to deal with the inherent anomalies in the design. For example, all the relations are in third normal form and any relations with join and multi-valued dependencies are handled appropriately.

(from the usually introductory-friendly Wikipedia)

I’m not sure I’ll ever find out (if you have a really good explanation, please do comment!). But what I did find out was what it meant for me in the context of this XML mapping: no repeated fields. EAD allows for repeated fields – for instance, multiple subjects would be encoded as:

<controlaccess> <subject>subject</subject><subject>subject 2</subject></controlaccess>

Try to import that into Excel, and you get, well, a mess. The whole description appears twice – once with subject, and once with subject 2. And if you try to export the schema, you get the error message that the map is not exportable because it contains denormalized data.

For this reason, Excel won’t support hierarchy. In EAD, the same fields are repeated at component level as at collection-level, just inside a different wrapper. If you thought it got messy when you add a single repeated field, just imaging having anything up to several thousand…

So, strip everything down to a single instance (which means separating collection and component level into different spreadsheets), and you have an XSD which will export (follow instructions in step 4 of that link – if you get a VBA error, debug instructions are in step 2). Hurrah! But how to make it useable?

Well, you have to put it back into Excel, and map the XML fields to Excel cells. This was tedious, but achievably tedious rather than crawling-through-help-forums tedious. Open up a new Excel document, click on ‘source’, and choose your shiny new XSD. This will give you a list of all the fields, in the right-hand pane. Mapping them to cells is simply a case of drag-and-drop – once you’ve mapped a field to a cell, that cell will be outlined in blue (as long as the source pane is showing). There’s an option to have Excel auto-label your fields with the content of the XML tag, but I decided that wouldn’t give the user-friendly interface I wanted, so I labelled them myself. Then colour-coded them. The result?

Screenshot of collection-level template

I had to tweak the exported XSD a little to allow for a field in which users can enter the reference codes of any components. This was my first experiences of hand-coding any of an XML schema, and it took a few tries to get right! But I managed to add and map the <dsc> and <c> elements:

<xsd:element minOccurs=”0″ nillable=”true” name=”dsc” form=”unqualified”>
<xsd:complexType>
<xsd:sequence minOccurs=”0″>
<xsd:element minOccurs=”0″ nillable=”true” type=”xsd:string” name=”c” form=”unqualified”/>
</xsd:sequence>
</xsd:complexType>
</xsd:element>

(If I wanted to play with the XSD a bit more, I guess I could make mandatory fields really mandatory, by fiddling with the minOccurs and/or nillable attributes, but I haven’t worked up the courage yet…)

This allows users to enter the reference codes of parent/child descriptions. Each component needs its own spreadsheet, and its own XML export. These are then run through a script by our programmer, which will use these parent/child references to create a single, hierarchical description. Theoretically, anyway – we haven’t been able to do much testing on it yet, and we’re not sure how well it will cope with components that are more than a level or two deep.

Remember denormalised data, and how you can’t have repeated fields? Obviously we can’t tell contributors that they can only have a single subject for each description! So in repeatable fields, multiple entries are pipe | delimited, so we can split them, eg:

<controlaccess><subject>subject 1|subject2|subject3</subject></controlaccess>

to

<controlaccess><subject>subject1</subject><subject>subject2</subject><subject>subject3</subject></controlaccess>

If users enter their subject sources in the same order, they’ll be matched up as attributes to the correct subject. The script also removes any empty fields (valid XML, but they break the EAD Editor), and adds the special Archives Hub mark-up for access points (used to distinguish between eg surname and forename in a personal name, and handy for linked data).

And there we are: a description, created in Excel, that’s valid EAD. We’re still in the process of testing the template, and making sure that it’s robust and meets users’ needs. If you’d like to be involved with testing, please get in touch.

 

Out and about or Hub contributor training

Every year we provide our contributors and potential contributors with free training on how to use our EAD editor software.

The days are great fun and we really enjoy the chance to meet archivists from around the UK and find out what they are working on.

The EAD editor has been developed so that archivists can create online descriptions of their collections without having to know EAD.  It’s intuitive and user friendly and allows contributors to easily add collection level and multi-level descriptions to the Hub.  Users can also enhance their descriptions by adding digital archival objects  – images, documents and sound files.

Contributor training day

Our training days are a mixture of presentation, demonstration and practical hands on. We (The training team consists of Jane, Beth and myself) tend to start by talking a little about Hub news and developments to set the scene for the day and then we move onto why the Hub uses EAD and why using standards is important for interoperability and means that more ‘stuff’ can be done with the data. We go from here on to a hands-on session that demonstrates how to create a basic record. We cover also cover adding lower level components and images and we show contributors how to add index terms to their descriptions. (Something that we heartily endorse! We LOVE standards and indexing!).

We always like to tailor our training to the users, and encourage users to bring along their own descriptions for the hands-on sessions. Some users manage to submit their first descriptions to the Hub by the end of the training session!

This year we have done training in Manchester and London, for the Lifeshare project team in Sheffield and for the Oxford colleges. We are also hoping (if we get enough take up) to run courses in Glasgow and Cardiff this year. (6th Sept at Glasgow Caledonian, Cardiff date TBC. Email archiveshub@mimas.ac.uk to book a place)

So far this year three new contributors have joined the Hub as a result of training:  Middle East Centre Archive, St Antony’s College, Oxford; Salford City Archive and the Taylor Institute, Oxford. We’ve also enabled four of our existing contributors to start updating their collections on the Hub: National Fairground Archive, the Co-operative Archive, St John’s College, Oxford and the V&A.

We have been given some great feedback this year and 100% of our attendees agreed/strongly agreed that they were satisfied with the content and teaching style of the course.

Some our feedback:

A very good introductory session to working with the EAD editor for the Archives Hub. I have not used the Archives Hub for a long time so an excellent refresher course.

This was a fantastic workshop – excellently designed resources, Lisa and Jane were really helpful (and patient!). The hands-on aspect was really useful: I now feel quite confident about creating EAD records for the Hub, and even more confident that the Hub team are on hand with online help

The hands on experience and being able to ask questions of the course leaders as things happened was really useful. Being able to work on something relevant to me was also a bonus.

Excellent presentation and delivery. I came along with a theoretical but not a practical knowledge of the Archives Hub and its workings, and the training session was pitched perfectly and was completely relevant to my job. Many thanks.

The Hub team train archivists how to use the EAD editor, archive students about EAD and Social media and research students in how to use the Hub to search for primary source materials. You can find our list of training that we provide on our training pages: http://archiveshub.ac.uk/trainingmodules/ .  We’re always happy to hear from people who are interested in training – do let us know!

A few thoughs on context and content

I have been reading with interest the post and comments on Mark Matienzo’s blog: http://thesecretmirror.com. He asks ‘Must contextual description be bound to records description?’

I tend to agree with his point of view that this is not a good thing. The Archives Hub uses EAD, and our contributors happily add very excellent biographical and administrative history information into their descriptions, via the tag, information that I am sure is very valuable for researchers. But should our descriptions leave out this sort of information and be just descriptions of the collection and no more? Wouldn’t it be so much more sensible to then link to contextual information that is stored separately?
Possibly, on the other side of the argument, if archivists created separate biographical/administrative history records, would they still want to contextualise them for specific collection descriptions anyway? It makes perfect sense to have the information separate to the collection description if it is going to be shared, but will archivists want to modify it to make it relevant to particular collections? Is it sensible to link to a comprehensive biographical record for someone when you are describing a very small collection that only refers to a year in their life?
Of course, we don’t have the issue with EAD at the moment, in so far as we can’t include an EAC-CPF record in an EAD record anyway, because it doesn’t allow stuff to be included from other XML schemas (no components from other namespaces can be used in EAD). But I can’t help thinking that an attractive model for something like the Archives Hub would be collection descriptions (including sub-fonds, series, items), that can link to whatever contextual information is appropriate, whether that information is stored by us or elsewhere. This brings me back to my current interest – Linked Data. If the Web is truly moving towards the Linked Data model, then maybe EAD should be revised in line with this? By breaking information down into logical components, it can be recombined in more imaginative ways – open and flexible data!

Archival Management Software

Archival Management Software: A Report for the Council on Library and Information Resources. Lisa Spiro, January 2009.

The Archives Hub is not in the business of archival management systems, but this report provides a useful perspective on what systems have to offer, and also the current state of cataloguing, albeit essentially in the US. Recommended reading. Here is a summary, highlighting some points of interest.

The report starts off on well-trodden ground about the number of hidden archives. As a partial remedy, it encourages providing access to materials through minimal steps (basic descriptions which may not be ‘perfect’), rather than providing detailed catalogues of a small percentage of holdings. At the same time it states that collection-level descriptions must be done well, otherwise they may not effectively represent the collection to users. The report refers to taking a stripped-down approach to cataloguing – quite a change from the norm for many archivists. This is an issue we have been thinking about at the Hub, and we have taken the decision to reduce the number of mandatory fields that we require of our contributors. A difficult decision, but we felt that we needed to fit in with the ethos that a minimal description is better than no description, and we should be conscious of the difficulties archives often have in providing comprehensive descriptions with only minimal resources.

As an interesting adjunct to the debate about the control archivists have over descriptions (the requirements for expertise), the report cites a project where students are paired with unprocessed collections in their area of interest and trained to catalogue them, resulting in access for users and research topics for the students. Presumably this work is overseen by archivists, but it is still a departure from the idea that cataloguing requires an ‘expert’.

The important point is to provide electronic access as ‘increasingly, materials that are electronically inaccessible are simply not used’ (quoting Jones, Hidden Collections, Scholarly Barriers, 2003). I was heartened to read that the Library of Congress Working Group on

We have ways of keeping control!


The Archives Hub has been putting itself about a bit over the past couple of years…by which I mean that we are becoming distributed. We have around 150 contributors, who provide us with their archive descriptions, and through the medium of EAD and our search and retrieval software, Cheshire, we make these available for cross-searching.

The role of the Archives Hub is to facilitate dissemination of information and therefore promote use of archives as widely as possible to enhance all kinds of research. But at the same time we have sought to be transparent in what we do and how we do it, and we have always emphasised that the data belongs to the contributors. What we don’t want them to feel is that once they pass their descriptions on to us that is pretty much that…it’s out of their hands. We like to think that we’ve avoided this by continuing to maintain personal contact with contributors, providing news and updates, being generally approachable…and sending out mugs and fun Christmas cards!

I find the whole issue of control very interesting. There are so many levels on which we can think about it now – the control of archive descriptions, the control of archives (getting into issues of preservation vs. access), the control that can come from understanding technology, and how far archivists have to understand technology in this day and age in order to have control, and also the issue of control with the advent of ‘Web 2.0‘ and user-generated content.

What we want to do is facilitate contributors having responsibility for their data, and one way of doing this is to enable them to host their own data and administer it themselves. As well as providing them with the software to do this, they can create their own web interface and give it a look and feel that they are happy with. This means that researchers (and archivists) still have the advantages of the Archives Hub as a central cross-searching facility as well as the means to search just the descriptions of one repository.

We will be moving to a new version of our software soon (Cheshire 3) and this will be particularly well suited to this distributed environment. However, that doesn’t mean that we will be pressing all of our contributors to set up their own server – we are still more than happy to host their data here at Manchester, and they have the added advantage of a data editor to check their descriptions and provide advice and support (which we are happy to do for the distributed contributors as well). But whether the data is here or held by the contributor, we want to continue to act as a facilitator rather than a controller.

I do wonder whether it is useful to talk about control of the data anyway – I think that we are moving towards a scenario where the movement of data will become more fluid, and we will want to provide access in more flexible ways. Maybe ‘control’ really means the ability to ensure that the archival descriptions are accurate and reliable – which generally relies upon the authority of the archivist – rather than implying that the channels of dissemination must be limited. What we want is one authoritative version of the description with any number of ways to actually get that information to the people out there.

Image: from Flickr courtesy of Telstar Logistics

EAD: your super flexible friend


I’ve just come across the Smithsonian Archives of American Art online collections. This is a wonderful source, with all the archives digitised and available to view. The navigation available on the site is great, with an image viewer allowing the user to scroll through images, enlarge them and navigate through the folders. It seems to me to be a very well designed site, with a clear information architecture enabling the user to drill down to different levels and get a good sense of exactly where they are. I like the way that they have used the mix of text, photographs and drawings. The site is not perfect though – it does fall down on the use of XHTML, which is not valid. I suspect that the Smithsonian have rather more resources available for this sort of project than many of us are lucky enough to get (although the project did receive external funding from the Terra Foundation for American Art).

I was particularly interested in this site because it uses EAD, so it is a great example of the way EAD descriptions can be re-purposed. Whilst for many of us, simple EAD descriptions are all that we have the time and resources to create at present, this shows how using EAD means that we retain the flexibility to create more ambitious sites in the future.

If you go to the ‘finding aid’ link you can see the more traditional EAD description.

Image from the Smithsonian Archives of American Art website, under fair use (for non-commercial purposes).

Training Day

Early Morning Exercises
The Archives Hub is holding a training day for contributors and potential contributors on Tuesday 25 September here at the University of Manchester.

The day is free and will run from 10.30 to 16.00 with a free lunch provided.

This is a great opportunity for anyone who would like to know more about EAD and about creating descriptions and indexing entries for the Hub. If you would like to attend please email us.

Illustration: Woodcraft Folk photo copyright © National Co-operative Archive.

Neuer Post

Umspannwerk Ost restaurantBlogger knows I’m in Germany – the interface is all in German. Neat. And just a teeny bit creepy.

I’m here with Jane at the International Standards for Digital Archives conference. Lots of presentations about EAD, EAC, METS and related standards. I was talking about the Spokes software yesterday (the EAD day). The picture shows the inside of the Umspannwerk Ost restaurant where we had dinner last night. It used to be an electrical substation. The conference venue (the Umweltforum) used to be a church. They’re good at recycling here.

Daniel PittiToday was all about EAC and METS – Daniel Pitti was one of the speakers giving the background to EAC in the morning. Apparently there have been complaints about the complexity of the standard, so Daniel was asking for more details on this problem, as work is about to start on rebuilding it ‘from the ground up’. I enjoyed his closing comment which was along the lines of “it doesn’t matter what you do in the privacy of your own repository, but if you’re going outside, please dress up in a standard” (or a nice hat, of course).