Creating name authorities

We are currently evaluating ICA-AtoM, and it is throwing up some really useful ideas and some quite difficult issues because it supports the creation of name authorities, and it also automatically creates authority records for any creator name and any names within the access points when you upload an archive description.

As far as I am concerned, the idea of name authorities is to create a biographical entry for a person or organisation, something that gives the researcher useful information about the entity, different names they use, significant events in their life, people they know, etc (all set out within the ISAAR(CPF) standard (PDF)). You can then link to the archive collections that relate to them, thus giving the archives more context. You can distinguish between archives that they ‘created’ (were responsible for writing or accumulating), and archives that they are a subject of (where they are referenced – maybe as an access point in the index terms).

The ideal vision is for one name authority record for an entity, and that authority record intellectually brings together the archives relating to the entity, which is a great advantage for researchers. However, this is not practically achievable. Just the creation and effective use of name authorities to bring archives together at any level is quite challenging. For the Hub we have a number of issues….

Often we have more than one archive collection for the same entity (let’s say person, although it could be a corporate body or family). I’ll take Martha Beatrice Webb as an example, because we have 9 collections on the Archives Hub where she is a creator. She’s also topical, as the LSE have just launched a new Digital Library with the Beatrice Webb diaries!

beatrice webb collections

There is nothing wrong with any of these entries. In fact, I pick Beatrice Webb as an example because we do have some really good, detailed biographical history entries for her – exactly what we want in a collection description, so that researcher can get as much useful information as possible. We have always recommended to our contributors that they do provide a biographical history (we know from researchers that this is often useful to them; although conversely it is true to say that some researchers see it as unnecessary).

We have nine collections for Beatrice Webb, and three repositories holding these collections.

If we were to try to create one authority record for these nine collections:

1. Which biographical history would we use? Would we use them all, which means a lot of information and quite a bit of duplication? Would we pick the longest one? What criteria could we use?

2. Two of these collections are attributed to Beatrice and Sydney Webb. Ideally, we would want to link the collections to two authority records – one for Beatrice and one for Sydney, but the biographical history is for both of them, so we would probably end up with an authority record for ‘Beatrice and Sydney Webb’ as an entity.

3. The name of creator is entered differently in the descriptions – ‘Beatrice Webb’ and ‘Webb, Martha Beatrice, 1858-1943, wife of 1st Baron Passfield, social reformer and historian’. This is very typical for the Hub. Sometimes the name of creator is entered simply as forename and surname, but sometimes the same elements as are used for the index term entry are used – sometimes in the same (inverted) order and sometimes not. This makes it harder to automatically create one authority record, because you have to find ways to match the names and confirm they are the same person.

4. Maybe it would be easier to try to create one authority record per repository, so for Beatrice Webb we would have three records, but this would go against the idea of using name authorities to help intellectually bring archives together and still leaves us with some of the same challenges.

5. How would we deal with new descriptions? Could we get contributors to link to the authority records that have been created? Would we ask them to stop creating biographical histories within descriptions?

6. The name as an access point is often even more variable than the name as creator, although it does tend to have more structure. How can we tell that ‘Martha Beatrice Webb’ is the same person as ‘Beatrice Webb’ is the same person as ‘Beatrice Potter Webb’? For this we’ll have to carry out analysis of the data and pattern matching. For some names this isn’t so difficult, such as for Clough Williams-Ellis and Clough Williams Ellis.  But what about ‘Edward Coles’ (author of commonplace book c. 1730-40) and ‘Edward Coles fl. 1741’? With archives, there is a good deal of ‘fl’ and ‘c’ due to the relative obscurity of many people within archive collections. If we don’t identify matches, then we will have an authority record for every variation in a name.

7. We will have issues where a biographical history is cross-referenced, as in one example here. It makes perfect sense within the current context, and indeed, it follows the principle of using one biographical history for the same person, but it requires a distinct solution when introducing authority records.

I would be really interested to know what archivists, especially Hub contributors, think about linking to authority files.  I am very excited about the potential, and I believe there is so much great information tied up in biographical and administrative history fields that would help to make really useful authority records, but the challenges are pretty substantial. Some questions to consider:

(1) Would you be happy to link to a ‘definitive’ authority record? What would ‘authority’ and ‘definitive’ mean for you?

(2) Would you like to be able to edit the ‘definitive’ record – maybe you have further information to add to it? Would this type of collective authorship work?

(3) Would you rather have one authority record for each repository?

(4) Would you rather have room within an archive description to create a biographical entry that reflected that particular archive?

(5) Would you like to see co-ordination of the creation and use of name authorities? Maybe it’s something The National Archives could lead, following their work in maintaining names through the National Register of Archvies?

I’m sure you can think of other questions to ask, and maybe you have some questions to ask of us about our review? Suffice to say that we have no plans to change our software – this is currently just an assessment, but we are seriously thinking of how we can start to incorporate name authorities into the Archives Hub.

Finally, its worth mentioning that if you are interested in ways to bring biographical histories together, you might like to follow our ‘Linking Lives’ linked data project through the Linking Lives blog, as for this project we are looking at providing an interface that gives then end user something a little like a name authority record, only it will include more external links to other content.

Making ‘Headway’

One of our key aims at the Hub is to increase the range of archives who can contribute. Not just because we like having lots of contributors (we do!), but because we want to help open up hidden archive collections, and help archivists to make them discoverable online.

Used under a CC licence from http://www.flickr.com/photos/markkelley/157662318/

So, new for 2012 is Project Headway. Building on some work we’ve been doing over the past couple of years with Calm and Adlib to improve the EAD export, Project Headway aims to make it easier for archives to contribute to the Archives Hub. We’re especially thinking about archives with little or no online presence, who may not have archival management systems.

With this in mind, one of the things Headway is going to be looking at is producing an Excel template, to allow institutions which catalogue in Excel to convert their catalogues to EAD. This would mean that they could upload their descriptions to online catalogues (such as the Hub), as well as giving them a version of their data that’s in a robust, sustainable, platform-neutral format.

Project headway is scheduled to run until then end of June, and we’re also going to be looking at EAD exports from ICA-AtoM, the Archivist’s Toolkit, and Modes, as well as  continuing our work with Calm and Adlib.

We hope to be able to expand on Headway work in the second half of the year, and look at other archival management systems, as well as export from Access databases.

If you’d like to know more about the project, want to volunteer to send us some descriptions, or have a system you’d like us to consider for phase 2, please get in touch!

HuBBub: January 2012

Some good developments

We’ve started off 2012 with two new developers for Mimas, both of whom will be doing some work for the Hub. Neeta Patel will be working on the UKAD website and some of the Hub interface developments and challenging global edits to help us improve the consistency and utility of the data. We also have Lee Baylis, who is working on our Linked Data project, Linking Lives. He will be helping to design the interface, and is currently beavering away on some exciting ideas for how researchers could customise their display for our biographical interface.

Punctuation for Index Terms

Something that may seem small, but is mighty complicated to execute: Currently we have a mixture of index terms with punctuation and no punctuation. This is because some descriptions came to us with, some without, and some through our EAD Editor – which adds punctuation (so these descriptions are all fine).

Just go to browse and search for ‘andrews’, for example, to see what I mean.  You can see:

Andrewes, Lancelot, 1555-1626, Bishop of Winchester
and
Andrews Barbara B 1829 Nee Campbell

The second is a little confusing without punctuation. But it is not easy to find a way to include punctuation for so many different names, with titles, dates, epithets, kings, queens, floruits, circas, etc. So, we are going to attempt to write scripts that will do this for us, and we’ll see how we go!

Alternative and Former Reference

We’ve taken a while, but finally we are displaying ‘former reference’ with an appropriate field heading. It has been complicated partly because  descriptions with these references often come from the CALM software, and some contributors want the former reference to be the current reference, because they don’t use the CALM automatically generated reference, whilst most want it to be the former reference, and for some it is more of an alternative reference. Finding it impossible to attend to all these needs, we are displaying any reference that is labelled as ‘former reference’ in the markup with the name of ‘Alt. Ref. Number’. This is a compromise, and at least ensures that all references are displayed.

Assessment of ICA AtoM

The Archives Hub is undertaking a review of current software, and as part of this we are looking at ICA-AtoM (Access to Memory). We will be undertaking a fairly detailed assessment of the software, from installation through to upload, search, display and other aspects such as scalability, Google exposure and APIs. We feel that AtoM offers a number of advantages, as free open source software, conforming to archival standards and with the the ability to incorporate name authority records and controlled vocabularies. We are also attracted by the lively international community that has built up around AtoM, and the ethos of sharing and working together to improve the functionality.

It will be interesting to see how it compares to our current software, Cheshire 3, which offers many advantages and sophisticated functionality, build up over 10 years to meet the needs of the Hub and archival researchers. Cheshire has served us very well, and provides stiff competition for any rivals, but it is important to assess where we are and what is best for us going forwards. Looking at other systems offers us the opportunity to think about functionality and assess exactly what we need going forwards.

Why Contribute?

We are constantly updating our pages, and adding new ones. Recently we’ve revamped the ‘Why Contribute?’ page as well as creating a new page, Becoming a Contributor. If you know of any archivists interested in the Hub, maybe you could point them to these pages as a means to provide some compelling reasons to be part of the Hub!

New Contributors

Our two latest contributors illustrate admirably the great diversity of Hub repositories. We have the Freshwater Biological Association with a collection about lakes and rivers in Cumbria and Scotland (if you ever wanted to know about bacteria counts, for example…), and also the National Meteorological Archive looking for a fair outlook by promoting their collections on the Hub.

Open Data

Some of you may have seen the announcement of the Open Data Strategy by the European Commission. This is very much in line with the increasing move towards open data: “The best way to get value from data is to give it away”.  The Archives Hub fully supports this ethos, and we will release all our data as open data unless any contributor wishes to opt out.

The Hub team wishes you all the best for 2012!

Online Survey Results (2011)

We would like to share some of the results of our annual online survey, which we run each year, over a 3-4 week period. We aim for about 100 responses (though obviously more would be very welcome!), and for this survey we got 92 responses. We create a pop-up invitation to fill out the survey – something we do not like to do, but we do feel that it attracts more responses than a simple link.

Context

We have a number of questions that are replicated in surveys run for Zetoc and Copac, two bibliographic JISC-funded Mimas services, and this provides a means to help us (and our funders) look at all three services together and compare patterns of use and types of user.

This year we added four questions specifically designed to help us with understanding users of the Hub and to help us plan our priorities.

We aim to keep the number of questions down to about 12 at the most, and ensure that the survey will take no longer than 10 minutes to complete. But we also want to provide the opportunity for people to spend longer and give more feedback if they wish, so we combine tick lists and radio boxes with free text comments boxes.

We take the opportunity to ask whether participants would be willing to provide more feedback for us, and if they are potentially willing, they provide their email address. This gives us the opportunity to ask them to provide more feedback, maybe by being part of a focus group.

Results of the Survey

Profile

  • The vast majority of respondents (80%) are based in the UK for their study and/or work.
  • Most respondents are in the higher education sector (60%). A substantial number are in the Government sector and also the heritage/museum sector.
  • 20% of those using the Hub are students – maybe less than we would hope, but a significant number.
  • 10% are academics – again, less than we would hope, but it may be that academics are less willing to fill in a survey.
  • 50% are archivists or other information professionals. This is a high number, but it is important to note that it includes use of the Hub on behalf of researchers, to answer their enquiries, so it could be said to represent indirect use by researchers.
  • The majority of respondents use the service once or twice a month, although usage patterns were spread over all options, from daily to less than once a month, and it is difficult to draw conclusions from this, as just one visit to the Hub website may prove invaluable for research.

graph showing value of the HubUse and Recommendation

  • A significant percentage – 26% – find the Hub ‘neither easy nor difficult’ to use, and 3% of the respondents found it difficult to use, indicating that we still need to work on improving usability (although note that a number of comments were positive about ease of use) .
  • 73% agree their work would take longer without the Hub, which is a very positive result and shows how important it is to be able to cross-search archives in this way.
  • A huge majority – 93% – would recommend the Hub to others, which is very important for us. We aim to achieve 90% positive in this response, as we believe that recommendations are a very important means for the Hub to become more widely known.

Subject Areas

We spent a significant amount of time creating a list of subjects that would give us a good indication of disciplines in which people might use the Hub. The results were:

    • History 47
    • Library & Archive Studies 33
    • English Literature 17
    • Creative & Performing Arts 16
    • Education & Research Methods 10
    • Predominantly Interdisciplinary 9
    • Geography & Environment 5
    • Political Studies & International Affairs 5
    • Modern Languages and Linguistics 4
    • Physical Sciences 4
    • Special Collections 4
    • Architecture & Planning 3
    • Biological & Natural Sciences 3
    • Communication & Media Studies 3
    • Medicine 3
    • Theology & Philosophy 3
    • Archaeology 2
    • Engineering 2
    • Psychology & Sociology 2
    • Agriculture 1
    • Law 1
    • Mathematics 1
    • Business & Management Studies 0
  • History is, not surprisingly, the most common discipline, but literature, the arts, education and also interdisciplinary work all feature highly.
  • There is a reasonable amount of use from the subjects that might be deemed to have less call for archives, showing that we should continue to promote the Hub in these areas and that archives are used in disciplines where they do not have a high profile. It would be very valuable to explore this further.

graph showing use of archival websites

  • The Hub is often used along with other archival websites, particularly The National Archives and individual record office websites, but a significant number do not use the websites listed, so we cannot assume prior knowledge of archives.
  • It would be interesting to know more about patterns of use. Do researchers try different websites, and in what order to they visit them? Do they have a sense of what the different sites offer?
  • There is still low use of the European aggregators, Europeana and APENet, although at present UK archives are not well represented on these services and arguably they do not have a high profile amongst researchers (the Hub is not yet represented on these aggregators).

Subsequent activities

  • It is interesting to note that 32% visit a record office as a result of using the Hub, but 68% do not. It would be useful to explore this further, to understand whether the use of the Hub is in itself enough for some researchers. We do know that for some people, the description holds valuable information in and of itself, but we don’t know whether the need to visit a record office, maybe some distance away, prevents use of the archives when they might be of value to the researcher.

What is of most value?

  • We asked about what is important to researchers, looking at key areas for us. The results show that comprehensive coverage still tops the polls, but detailed descriptions also continue to be very important to researchers, somewhat in opposition tograph showing what is most valuable to researchers the idea of the ‘quick and dirty’ approach. More sophisticated questioning might draw out how useful basic descriptions are compared with no description and what sort of level of detail is acceptable.
  • Links to digital content and information on related material are important, but not as important as adding more descriptions and providing a level of detail that enables researchers to effectively assess archives.
  • Searching across other cultural heritage resources at the same time is maybe surprisingly less of a priority than content and links. It is often assumed that researchers want as much diverse information as possible in a ‘one-stop shop’ approach, but maybe the issues with things like the usability of the search,  navigation, number of results and relevance ranking of results illustrate one of the main issues – creating a site that holds descriptions and links to very varied content and still ensuring it is very easily understandable and researchers know what they are getting.
  • The regional search was not a high priority but a significant medium priority, and it might be argued that not all researchers would be interested in this, but some would find it particularly useful, and many archivists would certainly find it helpful in their work
  • We provided a free text box for participants to say what they most valued. The ability to search across descriptions, which is the most basic value proposition of the Hub, came out top, and breadth of coverage was also popular, and could be said to be part of the same selling point.
  • It was interesting to see that some respondents cited the EAD Editor as the main strength for them, showing how important it is to provide ways for archivists to create descriptions (it may be thought that other means are at their disposal, but often this is not the case).
  • Six people referred to the importance of the Hub for providing an online presence, indicating that for some record offices, the Hub is still the only way that collections are surfaced on the Web.

What would most improve the Hub?

  • We had a diversity of responses to the question about what would most improve the Hub, maybe indicating that there are no very obvious weaknesses, which is a good thing. But this does make it difficult for us to take anything constructive from the answers, because we cannot tell whether there is a real need for a change to be made. However, there were a few answers that focused on the interface design, and some of these issues should be addressed by our new ‘utility bar’ which is a means to more clearly separate the description from the other functions that users can then perform, and should be implemented in the next six months.

Conclusions

The survey did not throw up anything unexpected, so it has not materially affected our plans for development of the Hub. But it is essentially an endorsement of what we are doing, which is very positive for us. It emphasised the importance of comprehensive coverage, which is something we are prioritising, and the value of detailed descriptions, which we facilitate through the EAD Editor and our training opportunities and online documentation. Please contact us if you would like to know more.

More Product, Less Processing?

I’ve been reading a fascinating article by Mark A. Greene and Dennis Meissner, ‘More Product Less Process: Revamping Traditional Archival Processing‘ (PDF). I wanted to offer a summary of the article.

image of scalesThe essence of this article is that archivists spend too long processing collections (appraising, cataloguing and carrying out minor preservation). This approach is not working; the cataloguing backlog continues to increase. We are too conservative, cautious and set in our ways, and we need to think about a new approach to cataloguing that is more pragmatic and user-focussed. The article was written by archivists in the USA, but would seem to apply to archives here in the UK, where we know that the backlog is a continuing problem.

I think the article makes the argument well and with a good deal of conviction. The bottom line is that we must rethink our approach unless we are to continue to accrue backlogs and deny researchers access to hugely valuable primary source material.

However, there are arguments in support of detailed cataloguing. For digital archives it is extremely useful to provide metadata at the item level,  enabling such useful resources as http://archiveshub.ac.uk/data/gb1837des-dca?page=3#id634580. With this detailed list, researches can see digital resources described and then access them directly. It could be argued that if a collection is to be digitised, providing this sort of level of metadata is appropriate, and in general it is the more valuable and highly used collections that are digitised. But for born-digital collections, this level of detail would be totally unsustainable.

Also, I wonder if the work that volunteers do should be taken into account – they may be able to help us catalogue in more detail, whilst trained archivists continue to create the main collection or series-level descriptions. I remember a whole band of NADFAS volunteers cataloguing photographs where I used to work. Furthermore, I was speaking to an archivist recently who said that they had taken the time to weed out duplicates (something this report criticises)…and then sold them on eBay for a tidy profit, that helped them fund their very under-resourced archive (they had the rights to do this!). So, maybe there are factors to take into consideration that support a detailed approach, but I think a bold approach to examining this whole area in UK archives would be very welcome.

Some of the points made in the report:

  • Archivists spend too much time cataloguing, not necessarily doing what is necessary. We think in terms of an ideal that we have to reach, although we haven’t actually articulated what this ideal is, and really examined it.
  • We are too attached to old-fashioned ways of doing things, which worked when we had smaller collections to deal with, but are not appropriate for large 20th century collections.
  • We give a higher priority to serving the needs of our collections rather than the needs of our users.
  • We need a new set of guidelines that focus on what we absolutely need to do.
  • We need to discuss, debate and examine our approach to cataloguing, and not be defensive about our roles.
  • We tend to arrange collections down to item level. In particular, we carry out preservation activities to this level. We accept the premise that basic preservation steps necessitate an item-level approach.
  • We often remove all metal fastenings and put materials into acid-free folders. So, even if we do not describe collections down to item level (maybe we just describe at collection or series level), we go down to this level of detail in our preservation activities.  Yet, with good climate control, metal fasteners should not rust, and as yet we do not have strong evidence of a detrimental effect of standard manila folders if the materials is stored in a controlled environment.
  • We often weed out duplicates throughout a collection, which requires processing down to item level. Is this really worth doing?
  • The various sources of advice about the level of detail we process archives to are inconsistent. Some sources advocate description to series level, but preservation activities to item level. NARA advocates preservation in accordance with intrinsic value and anticipated use, so, for example, new folders should only be used if current ones are damaged, and metal fasteners should be removed only if ‘appropriate’ – meaning where they are causing obvious damage.
  • We seem to believe that we need to aspire to ‘a substantial, multi-layered, descriptive finding aid,’ a reflection of ‘slow, careful scholarly research’.  But in reality, maybe we should adopt a more flexible approach, taking each collection in turn on its merits. Some may justify detailed cataloguing, but many do not.
  • We should take the position that users come to do research, and that we do not have to do this for them in advance.
  • We should ‘get beyond our absurd over-cautiousness’ about providing access to unprocessed collections, and make them available unless there are good legal or preservation reasons to restrict access or the collection is of extremely high value.
  • We have very inadequate processing metrics. Attempts to quantify processing expectations have resulted in wildly differing figures. Figures given in various studies include 3, 6.9, 8, 12.7 and 10.6 hours per cubic foot. Other studies have come up with between 3 and 5.5 days per foot.
  • One major study  by an archive centre revealed 15.1 hours were spent on each cubic foot, far more than the value that was placed upon  what was accomplished. The study gave ‘an improved sense of the real and total costs involved’.
  • The Greene/Meissner study looked at various projects funded by NHPRC grants (National Historical Publications & Records Committee), and found an average productivity figure of 9 hours per foot, but with highs of around 67 hours per foot.  It also conducted an email survey and found expectations of processing times averaged at 14.8 hours, although there was a high of 250 hours!
  • Grant funding often encourages an item-level focus, rather than helping us to really tackle our substantial backlogs. There should be more of a requirement to justify meticulous processing – it should only be for exceptional collections.
  • The study recommends aiming for a processing rate of 4 hours per cubic foot for most large 20th century collections, using a series-level approach for description and preservation.
  • Studies show a lack of standardisation, not only in our definitions but also around the levels of arrangement, preservation and access that are useful and necessary.  We do not have proper administrative controls over this work. We tend to argue for each of us having a unique situation, that does not allow for comparison, and we do not have a common sense of acceptibile policies and procedures.
  • Whilst we continue to process to item level, a substantial number do not make catalogues available through OPACs or Websites, arguably prioritising processing over user needs.

The report concludes that maybe we should recognise that ‘the use of archival records…is the ultimate purpose of identification and administration.’ (SAA, Planning for the Archival Profession, 1986).  Maybe we should agree that a collection is catalogued if it ‘can be used productively for research.’ And maybe we should be willing to take a different approach for each collection, making choices and setting priorities, rather than being too caught up in a ‘love of craftmanship’ that could be seen as fastidiousness that does not truly serve the user.

The question seems to be how much would be lost by putting speed of processing before careful examination of all documents in a collection.  Maybe this does require defining good cataloguing? Maybe we believe that our professional standing is tied up with undertaking detailed cataloguing…more so than the ever increasing growth of backlogs, where the papers are entirely unaccessible to researchers?

Greene and Meissner state that there should be a ‘golden minimum’ for processing, where we adequately address user needs and only go beyond this where there are demonstrable business reasons. They also believe that arrangement, description and preservation should all occur at the same level of detail, again, unless there are good reasons to deviate from this.

What do you think…?