The Hub out and about – presenting, training, and pubbing

The Hub team like to get out and about to present, teach, and chat about archives and information. It can get a bit lonely being a purely online service, with our users and contributors at the other end of an email or phone call, so we try to ensure that we take advantage of chances to meet them face-to-face.

The last week of November was a busy week for this! On the Wednesday Jane and I (Bethan) gave a presentation to the MA Library & Information students at MMU.

We’ve given similar presentations to Archive students and early-career professionals in the past, but this is the first time we’ve given one to Library students. I’m pleased to say it worked  well – the students were engaged and knowledgeable about archives, and how issues in libraries and archives cross-over.

It’s always very encouraging and stimulating to meet an enthusiastic group (I’d also met them the week before to talk about professional organisations), and both Jane and I really enjoyed giving the session. We had some nice feedback from the students, too, with one person saying:

The workshop was informative as well as entertaining. Complex issues were broken down so they were easier to understand. In a short amount of time a lot of areas were covered and due to the lively presentation style we all remained engaged and interested throughout.

And another said that they wished they had more next week!

I think it’s very important for us to be involved in talking to students, trainees, and early-career professionals. It’s good for them to hear from people who are actually working with the data that they’ll be creating. If nothing else, if we educate them about the need for good, interoperable data now, we’ll get better data from them later on! It’s also great to be able to tell them about the different sorts of jobs and opportunities there are for them, and hopefully give them some ideas about ‘alternative’ careers.

The next day saw me, Jane and Lisa heading down to London, for the inaugural ‘Hub in the Pub‘ on the Thursday evening, before a training session on the Friday. We joined forces with a large contingent of museum folk who were ‘Drinking about Museums’, and had a very enjoyable and useful couple of hours chatting about general information, data, and cultural heritage issues. We hope to have more ‘Hub in the Pub’ events in future, so watch our mailing list and twitter feed for details.

We made sure that the evening didn’t get too merry, so we were on top form for our contributors training day the next day. These training days are designed to help current and potential contributors use our EAD Editor, and are also a great chance to get to know our contributors  and chat to them about any issues they might have. We have a few places left on our next training day in Glagsow in January – do sign up if you’d like to come along, or contact us if you’d like to know more.

If you can’t get along to a training session, we have online audio tutorials and a workbook designed to give you a step-by-step guide to using the Editor – and we’re always happy to answer any questions.

A model to bring museums, libraries and archives together

I am attending a workshop on the Conceptual Reference Model created by the International Council of Museums Committee on Documentation (CIDOC) this week.
The CIDOC Conceptual Reference Model (CRM) was created as a means of enabling information interchange and integration in the museum community and beyond. It “provides definitions and a formal structure for describing the implicit and explicit concepts and relationships used in cultural heritage documentation”.
It became an ISO standard in 2006 and a Special Interest Group continues to work to develop it and keep it in line with progress in conceptualisation for information integration.
The vision is to facilitate the harmonization of information across the cultural heritage sector, encompassing museums, libraries and archives, helping to create a global resource. The CRM is effectively an ontology describing concepts and relationships relevant to this kind of information. It is not in any sense a content standard, rather it takes what is available and looks at the underlying logic, analysing the structure in order to progress semantic interoperability.
I come to this as someone with a keen interest in interoperability, and I think that the Archives community should engage more actively in cross-sectoral initiatives that benefit resource discovery. I am interested to find out more about the practical application and adoption of the CRM. My concern is that in the attempt to cover all eventualities, it seems like quite a complex model. It seeks to ‘provide the level of detail and precision expected and required by museum professionals and researchers’. It covers detailed descriptions, contexts and relationships, which can often be very complex. The SIG is looking to harmonise the CRM with archival standards, which should take the cultural heritage sector a step further towards working together to share our resources.
I will be interested to learn more about the Model and I would like to consider how the CRM relates to what is going on in the wider environment, and particularly with reference to Linked Data and, more basically, the increasing recognition of web architecture as the core means to disseminate information. Initiatives to bring data together, to interconnect, should move us closer to integrated information systems, but we want to make sure that we have complimentary approaches.
You can read more about the Conceptual Reference Model on the CIDOC CRM website.

English Language — subjectless constructions

This is (probably) a final blog post referring to the recent survey by the UK Archives Discovery Network (UKAD) Working Group on Indexing and Name Authorities. Here we look in particular at subject indexing.

We received 82 responses to the question asking whether descriptions are indexed by subject. Most (42) do so, and follow recognised rules (UKAT, Unesco, LCSH, etc.). A significant proportion (29) index using in-house rules and some do not index by subject (18). Comments on this question indicated that in-house rules often supplement recognised standards, sometimes providing specialised terms where standards are too general (although I wonder whether these respondents have looked at Library of Congress headings, which are sometimes really quite satisfyingly specific, from the behaviour of the great blue heron to the history of music criticism in 20th century Bavaria).

Reasons given for subject indexing include:
  • it is good practice
  • it is essential for resource discovery
  • users find it easier than full-text searching
  • it gives people an indication of the subject strengths of collections
  • it imposes consistency
  • it is essential for browsing (for users who prefer to navigate in this way)
  • it brings together references to specific events
  • it brings out subjects not made explicit in keyword searching
  • it enables people to find out about things and about concepts
  • it may provide a means to find out about a collection where it is not yet fully described
  • it maximises the utility of the catalogues
  • it helps users identify the most relevant sources
  • it can indicate useful material that may not otherwise be found
  • it enables themes to be drawn out that may be missed by free-text searching
  • it can aid teachers
  • it helps with answering enquiries
  • it facilitate access across the library and archive
  • it meets the needs of academic researchers
The lack of staff resources was a significant reason given where subject searching was not undertaken. Several respondents did not consider it to be necessary. Reasons given for this were:
  • the scope of the archive is tightly defined so subject indexing is less important
  • the benefits are not clear
  • the lack of a thesaurus that is specific enough to meet needs
  • a management decision that it is ‘faddy’
  • the collections are too extensive
  • the cataloguing backlog is the priority
Name indexing is considered more important than subject indexing only by a small margin, and some respondents did emphasise that they index by name but not by subject. Comments here included the observation that subject indexing is more problematic because it is more subjective, that subjects may more easily be pulled out via automated means and that it depends upon the particular archive (collection). As with name and place indexing, subject indexing happens at all levels of description, and not predominantly at collection-level. Comments suggest that subjects are only added at lower-levels if appropriate (and not appropriate to collection-level).
For subjects, the survey asked how many terms are on average applied to each record. According to the options we gave, the vast majority use between one and six. However, some respondents commented that it varies widely, and one said that they might use a few thousand for a directory, which seems a little generous (possibly there is a misunderstanding here?)
Sources used for subjects included the usual thesauri, with UKAT coming out strongest, followed by Unesco and Library of Congress. A few respondents also referred to the Getty Art and Architecture Thesaurus. However, as with other indexes, in-house lists and a combination approach also proved common. It was pointed out in one comment that in-house lists should not be seen as lesser sources; one respondent has sold their thesaurus to other local archives. There were two comments about UKAT not being maintained, and hopes that the UKAD Network might take this on. And, indeed, when asked about the choice of sources used for subject indexing, UKAT again came up as a good thesaurus in need of maintenance.
Reasons given for the diverse choice of sources used included:
  • being led by what is within the software used for cataloguing
  • the need to work cross-domain
  • the need to be interoperable
  • the need to apply very specific subject terms
  • the need to follow what the library does
  • the importance of an international perspective
  • the lack of forethought on how users might use indexes
  • the lack of a specialist thesaurus in the subject area the repository represents (e.g. religious orders)
  • following the recommendations of the Archives Hub and A2A
Image courtesy of Flickr Creative Commons licence, Luca Pedrotti’s photostream

* the title of this blog post is a Library of Congress approved subject heading

International archival standards: living in perfect harmony?

The International Council on Archives Committee on Best Practices and Standards met recently to look at the four ICA descriptive standards: ISAD(G), ISAAR(CPF), ISDF and ISDIAH. It was agreed at this meeting to delay a full review that might lead to more substantial changes and to concentrate on looking at harmonization.
On the Hub we use ISAD(G), which has become very widely recognised and used. ISAAR(CPF) is something that would be important if we started to think about implementing EAC-CPF, enabling our contributors to create authority records for creators of archives. We think that this is the sort of development that should have cross-sectoral agreement, and we are actively involved in the UK Archives Discovery Network (UKAD), which provides a means for us to discuss these sorts of issues across the archives community in the UK.
As far as the International Description for Descriptive Function (ISDF) is concencerned, I feel that a great deal more work is needed to help archivists understand how this can be practically implemented. Our new EAD Editor does allow contributors to add functions to their descriptions, but this is just using the EAD tag for functions. To me, the whole issue of functions and activities is problematic because I am looking at it from the perspective of aggregation. It is all very well for one institution to define their own functions and activities, but how does this translate into the wider environment? How do we successfully enable researchers to access archives by searching functions and activities across diverse institutions?
I have not really given any thought at all to the International Standard Description for Institutions with Archival Holdings (ISDIAH) other than to basically familiarise myself with the standard. For us, the unique code that identifies the institution and the institution’s name is all that we require within our descritions. We link to the Archon details for the institution, and maybe it is in the Archon directory of UK archives, that ISDIAH should be implemented? I am not sure that it would be appropriate to hold detailed information about individual institutions on the Hub.
I will be interested to see what the outcomes of the Committee’s work are. I wonder whether we need a greater understanding of the standards themselves before we try to understand how they work together? Maybe adopting more consistent terminology and providing a conceptual framework will help archivists to appreciate what the standards are trying to achieve and encourage more use, but I am doubtful. I think that a few training days: ‘Understanding the ICA Descriptive Standards’ wouldn’t go amiss for many archivists, who may have only recently adopted ISAD(G), let alone thought about the implications of the other standards.
In the appendices to the minutes, there are some interesting points of discussion. Even some of the assumptions seem to be based on a greater understanding of the standards than most archivists have. For example, ‘if you use ISAD(G) in conjuction with ISAAR, the Admin/Biog history element of ISAD(G) becomes useless because the description of the record creator is managed by ISAAR’. Well, yes, but I’m not sure that this is so clear cut in practice. It makes sense, of course, but how do we relate that to all the descriptions we now have? Also, ‘ISAAR can be used to structure the information contained in the Admin/Biog history element of ISAD(G)’ – that makes sense, but I know of no practical examples that show archivists are doing this.
I wonder if we really need to help archivists to understand the standards – what they are, what they do, how they work, how they can benefit resource discovery – before we throw a conceptual framework at them. At the same time, I increasingly feel that ISAD(G) is not relevant to the modern environment and therefore I think there is a pressing need to review ISAD(G) before looking at how it relates to other standards.

Hub contributors’ reflections on the current and future state of the Hub



The Archives Hub is what the contributors make it, and with over 170 institutions now contributing, we want to continue to ensure that we listen to them and develop in accordance with their needs. This week we brought together a number of Archives Hub contributors for a workshop session. The idea was to think about where the Hub is now and where it could go in the future.
We started off by giving a short overview of the new Hub strategy, and updating contributors on the latest service developments. We then spent the rest of the morning asking them to look at three questions: What are the benefits of being part of the Hub? What are the challenges and barriers to contributing? What sort of future developments would you like to see?
Probably the strongest benefit was exposure – as a national service with an international user-base the Hub helps to expose archival content, and we also engage in a great deal of promotional work across the country and abroad. Other benefits that were emphasised included the ability to search for archives without knowing which repository they are held at, and the pan-disciplinary approach that a service like the Hub facilitates. Many contributors also felt that the Hub provides them with credibility, a useful source of expertise and support, and sometimes ‘a sympathetic ear’, which can be invaluable for lone archivists struggling to make their archives available to researchers. The network effect was also raised – the value of having a focus for collaboration and exchange of idea.
A major barrier to contributing is the backlog of data, which archivists are all familiar with, and the time required to deal with this, especially with the lack of funding opportunities for cataloguing and retro-conversion. The challenges of data exchange were cited, and the need to make this a great deal easier. For some, getting the effective backing of senior managers is an issue. For those institutions who host their own descriptions (Spokes), the problems surrounding the software, particularly in the earlier days of the distributed system, were highlighted, and also the requirement for technical support. One of the main barriers here may be the relationship with the institution’s own IT department. It was also felt that the use of Encoded Archival Description (EAD) may be off-putting to those who feel a little intimidated by the tags and attributes.
People would like to see easy export routines to contribute to the Hub from other sytems, particularly from CALM, a more user-friendly interface for the search results, and maybe more flexibility with display, as well as the ability to display images and seamless integration of other types of files. ‘More like Google’ was one suggestion, and certainly exposure to Google was considered to be vital. It would be useful for researchers to be able to search a Spoke (institution) and then run the same search on the central Hub automatically, which would create closer links between Spokes and Hub. Routes through to other services would add to our profile and more interoperability with digital repositories would be well-received. Similarly, the ability to search across archival networks, and maybe other systems, would benefit users and enable more people to find archival material of relevance. The importance of influencing the right people and lobbying were also listed as something the Hub could do on behalf of contributors.
After a very good lunch at Christie’s Bistro we returned to look at three particular developments that we all want to see, and each group took one issues and thought about what the drivers are that move it forward and what the retraining forces are that stop it from happening. We thought about usability, which is strongly driven by the need to be inclusive and to de-mystify archival descriptions for those not familiar with archives and in particular archival hierarchies. It is also driven by the need to (at least in some sense) compete with Google, the need to be up-to-date, and to think about exposing the data to mobile devices. However, the unrealistic expectations that people have and, fundamentally, the need to be clear about who our users are and understanding their needs are hugely important. The quality and consistency of the data and markup also come into play here, and the recognition that this sort of thing requires a great deal of expert software development.
The need for data export, the second issue that we looked at, is driven by the huge backlogs of data and the big impact that this should have on the Hub in terms of quantity of descriptions. It should be a selling point for vendors of systems, with the pressure of expectation from stakeholders for good export routines. It should save time, prove to be good value for money and be easily accommodated into the work flow of an archive office. However, complications arise with the variety of systems out there and the number of standards, and variance in application of standards. There may be issues about the quality of the data and people may be resistant to changing their work habits.
Our final issue, the increased access to digital content, is driven by increased expectations for accessing content, making the interface more visually attractive (with embedded images), the drive towards digitisation and possibly the funding opportunities that exist around this area. But there is the expense and time to consider, issues surrounding copyright, the issue of where the digital content is stored and issues around preservation and future-proofing.
The day ended with a useful discussion on measuring impact. We got some ideas from contributors that we will be looking at and sharing with you through our blog. But the challenges of understanding the whole research life-cycle and the way that primary sources fit into this are certainly a major barrier to measuring the impact that the Hub may have in the context of research outputs.

Archival Context: entities and multiple identities


I recently took part in a Webinar (Web seminar) on the new EAC-CPF standard. This is a standard for the encoding of information about record creators: corporate bodies, persons and families. This information can add a great deal to the context of archives, supporting a more complete understanding of the records and their provenance.

We were given a brief overview of the standard by Kathy Wisser, one of the Working Group members, and then the session was open to questions and discussion.

The standard is very new, and archivists are still working out how it fits in to the landscape and how it relates to various other standards. It was interesting to note how many questions essentially involved the implementation of EAC-CPF: who creates the records? where are they kept? how are they searched? who decides what?
These questions are clearly very important, but the standard is just a standard for the encoding of ISAAR(CPF) information. It will not help us to figure out how to work together to create and use EAC-CPF records effectively.
In general, archivists use EAD to include a biographical history of the record creator, and may not necessarily create or link to a whole authority record for them. The idea is that providing separate descriptions for different entities is more logical and efficient. The principle of separation of entities is well put: “Because relations occur between the descriptive nodes [i.e. between archive collections, creators, functions, activities], they are most efficiently created and maintained outside of each node.” So that if you have a collection description and a creator description, the relationship between the two is essentially maintained separately to the actual descriptions. If only EAD itself was a little more data-centric (database friendly you might say), this would facilitate a relational approach.
I am interested in how we will effectively link descriptions of the same person, because I cannot see us managing to create one single authoritative record for each creator. This is enabled via the ‘identities’: a record creator can have two or more identities with each represented by a distinct EAC-CPF instance. I think the variety of identity relationships that the standard provides for is important, although it inevitably adds a level of complexity. It is something we have implemented in our use of the tag to link to related descriptions. Whilst this kind of semantic markup is a good thing, there is a danger that the complexity will put people off.
I’m quite hung-up on the whole issue of identifiers at the moment. This may be because I’ve been looking at Linked Data and the importance of persistent URLs to identify entities (e.g. I have a URL, you have a URL, places have a URL, things have a URL and that way we can define all these things and then provide links between them). The Archives Hub is going to be providing persistent URLs for all our descriptions, using the unique identifier of the countrycode, repository code and local reference for the collection (e.g. http://www.archiveshub.ac.uk/search/record.html?id=gb100mss, where 100 is the repository code and MSS is the local reference).
I feel that it will be important for ISAAR(CPF) records to have persistent URLs, and these will come from the recordID and the agencyCode. Part of me thinks the agency responsible for the EAC-CPF instance should not be part of the identifer, because the record should exist apart from the institution that created it, but then realistically, we’re not going to get consensus on some kind of independent stand-alone ISAAR(CPF) record. One of the questions I’m currently asking myself is: If two different bodies have EAC-CPF records, does it matter what the identifers/URLs are for those records, even if they are for the same person? Is the important thing to relate them as representing the same thing? I’m sure its very important to have a persistent URL for all EAC-CPF instances, because that is how they will be discoverable; that is their online identity. But the question of providing one unique identifier for one person, or one corporate body is not something I have quite made my mind up about.
It will be interesting to see how the standard is assessed by archivists and more examples of implementation. The Archives Hub would be very interested to hear from anyone using it.

Let there be images!


I’m embroiled in our Enhancement Project at the moment, part of which is about enabling images to be displayed within the Archives Hub. Well, it’s actually more than that – it’s about using the tag and related tags to enable links to digital representations of archives and to enable images to be embedded at collection and item level. It’s something we’re really excited about, and we feel that it’s important to make this step in order to keep the Archives Hub moving onwards and upwards.

Due to the distributed nature of the Archives Hub, we aren’t able to use the element, but we’ve made the most of the tags on offer. We’re implementing options for embedded images; links to files; thumbnail links to full-size images; groups of images representing the same item.

We’ve made a conscious effort to implement this in a very standards-based way. I suppose you could say that the principle should be that if the EAD records are put into another system, everything should still work, and the markup does allow for this. I think that this approach is also important because we have a service where we are not creating the data – our contributors are – so we need to try to meet their various requirements whilst at the same time not knowing exactly what they will contribute. For example, we have to be aware that they might enter a large, high resolution image as a thumbnail and the system needs to be able to cope with this. I see it as a learning experience for both us and our contributors, and I think that it’s important to take that sort of perspective with the Hub.

I do hope that Hub contributors take advantage of this development. It will be great for them to be able to include images and link directly to content. We’ve made it very easy to add the necessary markup by providing the facility to do this within our new Data Creation and Editing Template, so there is no need to get down and dirty with the EAD markup unless they want to. We’ll be talking to our contributors about this at our workshops in March/April, which are already pretty much full, so that’s a good indication for us.

For more information, see our page on adding digital objects to Hub descriptions.

Neuer Post

Umspannwerk Ost restaurantBlogger knows I’m in Germany – the interface is all in German. Neat. And just a teeny bit creepy.

I’m here with Jane at the International Standards for Digital Archives conference. Lots of presentations about EAD, EAC, METS and related standards. I was talking about the Spokes software yesterday (the EAD day). The picture shows the inside of the Umspannwerk Ost restaurant where we had dinner last night. It used to be an electrical substation. The conference venue (the Umweltforum) used to be a church. They’re good at recycling here.

Daniel PittiToday was all about EAC and METS – Daniel Pitti was one of the speakers giving the background to EAC in the morning. Apparently there have been complaints about the complexity of the standard, so Daniel was asking for more details on this problem, as work is about to start on rebuilding it ‘from the ground up’. I enjoyed his closing comment which was along the lines of “it doesn’t matter what you do in the privacy of your own repository, but if you’re going outside, please dress up in a standard” (or a nice hat, of course).

Functions and activities of archive creators

A draft new standard has been published on the International Council on Archives’ website. ISAF (International Standard for Activities/Functions of Corporate bodies) is a sister standard to ISAD(G) and ISAAR (CPF), which are the standards for the description of archives and names respectively.

The description of records from a functional perspective has been becoming more common over recent years – partly a reflection of the number of times that governments and institutions re-organise their departments. The activities of a particular organisation (and the resulting records) often remain relatively constant, although the name of the section or department might change over time. ISAF provides a way of identifying these functions and linking them to the appropriate corporate entities and related records.

You can see the application of these functional descriptions in Glasgow University Archive Services’ GASHE website, where it is possible to browse the different activities and functions undertaken by the Scottish Higher Education institutions whose records are described in GASHE. The functional descriptions provide an overview of the activity (for example finance management/financial audit), with links to the various corporate entities involved in the activity (described in ISAAR authority records) and to the archives produced by the activity (described according to the ISAD(G) standard).

The deadline for comments on the draft standard is 31 March 2007.