Cataloguing Matters: being part of the information landscape

November 9, 2015 / Jane Stevenson

Consider the following questions, which use the topic of design history, but could be for any topic area:

Where did this person work?
Who did they know?
What can I find out about furniture design in London in the early 20th century?
Who designed this early 20th century chair?
Did the designer feature in this exhibition?
Can I find photographs of this section of the exhibition?
Did these designers both feature in this design exhibition?
Who influenced this designer?

These are surely typical questions for researchers. They are the sort of questions we thought about when we were working on Exploring British Design. But these questions do not start with the archive. An archive collection may well hold many answers for these questions, but there remains the problem of connecting these questions to the archive: ‘I’m interested in this designer/in this chair that they designed/in this exhibition they designed for’. We need these questions to lead to archival sources when appropriate.

Which comes first, the research question or the archive? We tend to assume the archive, but for many researchers the question comes first and the archive is, at that point, not known to them. We become a little fish in a very big pond when we enter the Web, and so we need to find ways for researchers who may not be aware of us to hook our collections.

Wellcome Library, London: Fishing

But how do we achieve this? We often have to work within considerable constraints and we cannot simply transform our cataloguing practices. There are many technical solutions that we can potentially make use of, but I think the first challenge lies with the data itself.

Individual archive repositories may think that there is no way they can move into a Linked Data world of RDF and triples and persistent URIs. But I think there are steps that we could take to help our descriptive data better fit this kind of landscape and have the potential to be more ‘linked in’ to other sources and to researchers’ paths of discovery.

Our descriptive practices tend to treat collections as stand-alone entities, rather than integrated parts of a whole information landscape. We do not think enough about cataloguing in a way that facilitates an integrated approach where we can connect to other information resource and allow researchers to come to archives through searching for people, organisations, subjects, events and places, and to come to them as part of a whole network of resources. I think we need to try to think about providing potential ‘connectors’ within our descriptions that will allow them to be hooked into the landscape more effectively.

Here are some thoughts about how we can help to achieve this:

Be consistent when cataloguing

This may sound straightforward, but having looked at thousands of descriptions from well over two hundred archive repositories, I can announce that it doesn’t always happen. For instance, simply entering the name of the repository in the same way for every catalogue entry, and not adding the repository code as part of the repository name, for example, ensures that the repository is always correctly identified. This means that all of the collection descriptions can be clearly identified as being from the same institution.

When entering names, think about how you structure them. Many archive systems provide a means to store and link to names, and yet it is amazing how often one name varies. If nothing else, think about adding life dates to a name, which helps with unique identification.

Here on the Archives Hub we have to hold our hands up for a potentially unhelpful practice up till now of encouraging the creator name to be entered in one way under ‘name of creator’ and another way as an index term. This reflects a time when we were less focussed on machine processing. Of course, it is much better to enter the name consistently, so that the connection can clearly be made.

Try to use rules and standards

Whilst I have increasingly become somewhat frustrated by our standards, I still think there is a role for standards to encourage consistency, clarify meaning and help draw things together. If you enter subjects using UKAT or LCSH, then try to ensure that all your terms really do come from these thesauri. We get examples where the thesaurus is named, but the subject is not actually from the thesaurus.

When entering things like language codes (ISO standard 639), take a few minutes just to find out about them. They need to be lower case, to be consistent. It’s worth having an understanding of what you are doing and why.

Do the same for dates. Think about what a normalised date is and why it is important. It is really worth having a sense of why these things matter. ISO8601: “The purpose of this standard is to provide an unambiguous and well-defined method of representing dates and times”. That sounds perfect for archives, and well worth adopting.

Think about the questions people ask

One of the advantages of indexing is that it gives you a chance to consider the terms you can best use to ‘advertise’ your collection. It is really good to if the scope and content text is clearly reflected in the index terms. If your archive is about a designer and you have described things they have designed, then you might realise that you haven’t used ‘furniture designer’ in your text, but this is a good index term to use. Or maybe the archive is really useful for those looking at the history of design education, but you haven’t yet actually used the term ‘design education’ in your text. You can add it as an index term.

Try to think beyond the UK

This may again sound obvious, but unfortunately NCA Rules don’t exactly encourage this, even if they don’t prevent it. When indexing by place name, add the county and certainly add the country. Indexing really helps us think about this. Consider a typical biographical history entry:

Charles Edward Sayle was born in Cambridge on 6 December 1864. He entered New College, Oxford, in 1883, and St John’s College, Cambridge, in 1890.

It wouldn’t look quite right putting:

Charles Edward Sayle was born in Cambridge, England, on 6 December 1864. He entered New College, Oxford, England, in 1883, and St John’s College, Cambridge, England, in 1890 [etc]

And whilst it would help with identification, it is still unstructured. Much better to ensure you have:

Place name: Cambridge, England

in your structured index terms. True, there are plenty of ways to mark this up, some better than others, but at least having an index term entered in this way is a great help with uniquely identifying the Cambridge that you mean, as opposed to the one in the US or New Zealand, Australia or Jamaica. It means that a researcher who is researching a topic around ‘Cambridge, England’ is more likely to find your description.

If you put your place names into some kind of specific field, try to avoid something like ‘Cardiff, Merthyr Tydfil, Cambridge’ (an example from the Hub) or ‘Cambridge etc’ (another example). So if your cataloguing software provides just one box for place, repeat that box for each place, rather than putting a load of them into one entry. If it allows for something like place name and country, use that to your advantage.

Put index terms into the most appropriate categories

On the Hub we’ve had plenty of family names as personal names and personal names as corporate names. But most commonly we get genre as subject. If your archive contains photographs, press cuttings, preliminary sketches or parchment (animal material) and you want to make this known, you need to index by material type, not by subject ….unless the archive is about photographs or about parchment.

Think Reference!

Your references should give the ability to uniquely identifying every part of a hierarchical archival description. Think carefully about adding them and make sure you have a unique reference at every level of description. Some archival software will automatically generate references, which does help because then they will be consistent and unique. But if you add them manually, it is very easy to make a mistake. On the Hub we found several hundred duplicated ‘unique references’ when we went through a major exercise to clean up references. And I do mean several hundred, not one or two.

Optimise titles for search engines

All of this structured data really needs the boost of descriptions that work well on the Web, simply in terms of being discoverable. SEO, or search engine optimisation, is a big topic, and your system may not allow you to make many changes, but in terms of the actual data, do think about appropriate titles, which are a really good hook. Include the most significant words, make sure that are not too long (so they are easy to scan for a researcher looking through various sources) and make titles at collection level self-explanatory. At lower levels this is not such a problem, as it is possible to append the collection title, to help with interpretation.

Look at how your data exports

This may not be possible or practical for everyone, but if you can do it I think it is a really good indicator of how interoperable your data might be to see how it looks when it is exported, because that is when you are removing it from the comfort of your own familiar system, and potentially unleashing it into the world.

The Event Horizon

I am particularly interested at the moment in events. History is made up of events. Who’d have thought it, when our standards barely mention them!

EAD (Encoded Archival Description), our XML standard for descriptions, doesn’t allow for events to be indexed at all. CIDOC CRM is “a formal structure for describing the implicit and explicit concepts and relationships used in cultural heritage documentation” and it is event-based. It may be a complicated beast, not much used within our domain, and many of you won’t be aware of it, but there is certainly something to be said for putting events at the forefront of how we think.

The new EAC-CPF standard allows for chronologies in the biographical section of the description, but this is more about narrative (a list of events as part of one name authority description). It is based partly on ISAAR(CPF) which states “Record in narrative form or as a chronology the main life events, activities, achievements and/or roles of the entity being described.” Narrative form is fine for a researcher who is perusing the description, but structured form that is not necessarily so closely tied to narrative is what is required for more integration of data.

People, organisations, places and subjects are all linked by events – that is where the connections and the stories are. I think it is a major shortcoming of our standards that events are not given more prominence.

I mention this here really just to say that whilst we can improve our descriptions, and think about consistency and indexing, I do think that there are things around cataloguing that may require a more fundamental change.

Research Questions

Coming back to the initial questions I asked. Our cataloguing can help to ensure that archives are networked in to the broader information landscape. This is not about whether the archive can in principle answer these questions. It is about whether a researcher asking these questions (maybe initially through Google, maybe through other generic channels) can discover the archive as one source amongst many:

Where did this person work?
Uniquely identify the place(s).

Who did they know?
Make sure names are consistent and try to make them unambiguous.

What can I find out about furniture design in London in the early 20th century?
Use appropriate index terms and think about how to add dates consistently.

Who designed this early 20th century chair?
Adding index terms helps to draw out significant names and concepts

Did the designer feature in this exhibition?
Entering the exhibition name and designer name as structured data will help.

Can I find photographs of this section of the exhibition?
Make sure you have the exhibition name clearly stated and think about using index terms for formats – these are a great means for researchers to find types of material. But remember, this is ‘photographs’ as a genre or form, not as a subject!

Did these designers both feature in this design exhibition?
Not so easy, as we don’t index by event type

Who influenced this designer?
Not so easy, as we don’t tend to provide structured relationship information to connect people other than in the broadest possible way (these two people were ‘associated’). But that is another story…

To my mind, cataloguing is a skill, and it is really worth thinking about what you are cataloguing and how you catalogue it carefully. It is more important to think about this now than it was 30 years ago, because 30 years ago we were working with with narrative descriptions and index cards. Now we want our data to be interconnected.

In With the New: open, flexible, user-centered

May 16, 2013 / Jane Stevenson

The 2013 Eduserv Symposium, was held in the impressive (and very much ‘keep in with the old’) surroundings of One Great George Street in Westminster, the home of the Institute of Civil Engineers.

‘In with the New’ covered new skills sets, new modes of engagement and new ways of working. With such a wide topic area, the conference took quite a broad-brush approach. Andy Powell of Eduserv introduced the day and talked about dealing with change, change that may be imposed upon us from the outside, as well as being driven internally.

David Cotterill from the Government Digital Service gave the opening keynote, which is what I want to focus on here. He said his talk was about ‘my exciting life as a civil servant’….the audience weren’t convinced about this at the outset, but maybe for those interested in open data, there was some shift of opinion by the end!

He talked about the old consensus, which was built around long-term contracts for IT in government; contracts that were consistently awarded to a limited number of suppliers and not to smaller and more innovative suppliers. IT was not defined as a core function, so out-sourcing was considered appropriate. But in the 21st century things have changed. There is recognition that IT covers very diverse areas. For Government (and for many other organisations), it covers digital public services, mission IT systems (i.e. more niche or specialised systems for government departments), desktop, infrastructure, connectivity, etc. (the more general IT), and, within government, there are also ‘shared services’ (such as for financial systems). David talked about the need to structure mission IT systems and digital public services so that they can run on different desktops or infrastructures and not be tied down (as often used to be the case).

David went on to argue that the Government really has taken up the open agenda, and showed some quotes: “The latest step is the publications of this report on open standards. And once again the government has got it right.” (Wall Street Journal). He argued that in order to have flexibility to progress, to upgrade, to move forwards, you need open and standards based systems. You also need to look at specific needs in specific areas and not think of IT as some kind of monolithic thing.

It was surprising to hear him say that “this is a great time to be a supplier”, but he said that many of the current deals within government come to an end over the next few years, so there is opportunity for new suppliers and creating a more diverse set-up.

What is 21st century government about? David said it’s about things like www.gov.uk/, built using a platform approach (rather than a CMS) which allows the Government Digital Service (GDS) to build products onto it that meet user needs; products that enable the government to engage with citizens. David gave a sense of how this approach is working across UK government, with multi-disciplinary teams including developers, designers, product and service managers, policy, communications, etc.

His core message was to start with the user need. Of course, this is something that we can all agree with, although whether it always happens in reality is debatable, even if it is the intention. We need to shape things in terms of user requirements right from the start, and not bring it in once all the policy, requirements and development work is done. We should think about capturing requirements and developing alpha and then beta versions before going live. This may mean that what is initially developed is chucked out after the alpha stage, because it doesn’t meet needs, and then there is a need to start again. I think one of the problems with this approach is that funders do not necessarily facilitate it. How easy would it be to get funding for a project where the iterative process may go on for quite some time, and there is a risk of starting again several times in order to get it right? A further difficulty with this from a funding point of view is that it is much harder to specify what you are going to end up with, because you necessarily need to keep an open mind; you’ll end up (hopefully) with what users want, but it might be different to what was envisaged and you’ll only know after the testing and refining process.

It makes we think about archival software systems, for example. Surely you should put the user needs at the heart of the development of your system? Ideally you would start out by gathering user requirements for a system, maybe looking at other research done in this area. You’d end up with a specification, listing priorities for your system. Most archives can’t then build it themselves, so they would go out and look at what meets these needs. But would it be possible to test a system out with users, to see if it really does fulfill their needs, and if it doesn’t go back and try something else? The problem here is that if you are buying a system, its hard to apply an iterative approach. However, it may be possible to move to a more user-centered approach. You should have clear evidence that the system does meet key user needs, and, in the absence of an ability to chop and change, you should ensure that the system does not tie you down and that it provides the flexibility to build and modify, so that changing priorities can be met.

It’s good to see Government leading the way. David showed previews of some services that are being developed, working towards a more transparent approach to things like transactional services and he highlighted a government manual about building services that people want to use. There is now a ‘Standards Hub‘, to promote open standards and also to encourage wider participation in solving data challenges. It is amazing to see Government code on GitHub. Somehow that really brought home to me home how different things are now to 10-15 years ago. David, as well as other speakers at the conference, believes that open standards encourage a more efficient approach, so it becomes a cost-saving venture as well as encouraging public engagement and transparency.

Interoperability, data sharing and standards

April 18, 2013 / Bethan Ruddock / 2 Comments

I recently spoke at the CILIP MmIT group conference, where I inflicted EAD on a group of unsuspecting librarians. Not just EAD, but MARC and MODS XML and even some Linked Data. They may have said it was a bit like going back to library school, but no-one ran away.

I was talking to them about data sharing and interoperability, and asked them to look at resources described using different schema, to think about appropriateness: how well does the data format allow you to describe the resource? How machine-readable is it? How human-readable is it? How human/machine readable does it need to be? Is the format robust? Transformable? Sustainable? Interoperable?

Opening up: bibliographic data sharing & interoperability from Bethan Ruddock

These are all things you need to consider when you’re deciding which format to put your data in – except, of course, we often don’t think about these things much at all. These decisions might have been effectively made for you by the community. If all of your peer institutions use a certain data format, then you’re more likely to use it too. And if you want to share your data with the community, using the same format as they do is important.

But this means that you’re relying on other people to make these decisions about the best format for your data. Those people might know the sector and the issues involved in general, but they might not know your specific circumstances or users. Their decision might have been made a long time ago, before advances in theory and technology (MARC was first developed in the 1960s, and EAD in the 1990s). The choice of format might have been based on available tools, rather than underlying principles.

The same goes for cataloguing standards. Is sticking strictly to ISAD(G) really the best way to describe your collections to meet the needs of a global audience? (This is a topic that’s up for discussion at the Descriptive Standards Roundtable at the 2013 ARA Conference )

Of course, standards only work as standards if there’s sufficient community take-up, and a consensus on how to apply them.

But progress isn’t made by blindly following rules, and ‘there’s already a standard for that’ is no reason not to think about whether there could be a better standard for it.

Standards should be developed from needs. What do people need to know? What do they need to be able to do with the data? What do we need to be able to tell them? And, if we’re looking to the future, what might they want to be able to do in the future? What do we need to do to the data now, to allow for future wants?

We can only work with what’s available, and it is important to have shared standards and points of reference. But if you don’t take time to consider these points when you’re choosing a standard, you’re not really choosing at all. You’re just perpetuating the status quo.

So take the time to think about what you’re doing with your data. Know why you’re using a particular standard, even if it’s because it’s the best of a bad bunch, or closest to what you want to do. Think about what it can and can’t do. Talk to others who are using it. Look for chances to comment on proposed revisions. The future of standards is the future of your data, and your data is valuable. Don’t let it decay.

The Quest for Single Search

September 26, 2011 / Jane Stevenson

This post is based on a report published by OCLC Research, Single Search: The Quest for the Holy Grail (Leah Prescott and Ricky Erway, 2011).

It is less than ideal when users can benefit from a single search option for resources across the internet, but within an institution they are presented with a range of search systems for different services and resources. A single search obviously allows researchers to search across the organisation’s resources; it may also give a sense of the rich resources of an organisation and may provide a motivation to build upon them.

The OCLC report is based upon discussions with nine organisations that have implemented single search. There are certainly substantial challenges, not least the resources required and the need for effective collaboration across an institution. But it is clear that single search, if it is provided effectively, will help researchers and will help to harmonize collections management.

Single search needs to simplify rather than complicate the user experience, and sometimes the challenges this poses are not addressed and a single search ends up being a frustrating or confusing experience. We know that some users find navigating archival hierarchical descriptions confusing; adding library and museum items to this increases the challenge. Different collections may be catalogued very differently and to different levels of granularity, so presenting a coherent list of results is not easy. Added to this, many institutions now have digital collections, but only a part of their resources are digitised and so there is a need to indicate clearly what is digital (what can be accessed digitally) and what requires a visit to the institution.

The OCLC report refers to single search having the ability to ‘fundamentally change how an institution identifies itself’. Maybe if the single search represents a large part of the resources of an institution this is true; it is not likely to be the case in a university, where the collections are only a small part of the university’s business. Single search may enable curators, archivists and librarians themselves to get a more coherent view of the collections. This could be a useful advantage, as we know that often curators in charge of one collection or subject area do not necessarily have a good understanding of the whole. It may encourage a more efficient and streamlined approach to collections management.

Amongst the nine institutions that formed part of the OCLC discussions, some did have a mandate to create single search, but even with this kind of directive, there is a need for senior managers to provide the resources required and ensure that it is made a priority. In addition, the isssue of individual motivation is significant. I think this is a fascinating area that is sometimes overlooked: The extent to which the staff involved are motivated to work together and to achieve a vision must have a substantial impact on the outcome. What sort of role to ‘champions’ play? How important are they? Does it come down to individuals with intellectual curiosity and the willingness to learning new skills and change working habits? Is it important for the institution to foster this kind of attitude in order to ensure that innovations like single search are likely to work? One of the institutions in the OCLC report referred to the staff that had been selected to work on a single search as being selected for their ‘interest, skills and capacity to work on the program’. I have certainly come across colleagues who are frustrated by a lack of co-operation from other staff, which can significantly hamper any kind of innovative changes to metadata creation and cross-searching.

I think that attitudes are key to success in a project like this, where working practices may have to change and habits may need to be broken. It reminds me of that great YouTube video of the lone dancer who is joined by just one person – one is a crazy lone dancer, and others tend to try to ignore him/her; but once just one person joins in you have a group, and once you have two, then you’re more likely to get three, then four, and then the group builds up to the extent where those who are reluctant to join in anything a bit new or different, where they might embarrass themselves, end up joining in because not joining in becomes the exception rather than the rule. It’s a slightly different scenario but the point is similar.

The size of the institution is likely to have an impact. A small institution is often more agile, and getting buy-in may be easier, although there may be less resource to draw on. Maybe for a large organisation, trying to implement something that cuts through the departments and teams in a very horizontal way, like single search, is harder if the organisational structures remain the same. The priorities of the different departments involved may end up pulling against the project. It becomes all the more important to define the goals, get buy-in at the right levels, have clear and effective communication channels, and also find an effective way to keep the momentum and motivation going.

The OCLC report makes one observation which resonates very much with me: ‘It is important for the success of the project to have representation…from IT units, as weak motivation within the IT area of an organization has the power to paralyze such a project.’ The important thing here seems to be to ensure that the right people are included at the right stages in the project. IT should be brought in right at the outset and a real effort should be made to develop not only a common understanding but also a feeling of good will and strong motivation.

As the OCLC report states: ‘The reality of achieving an integrated access vision could mean overturning years or decades of institutional thinking, which has segmented collections management practice among the three different sectors of LAMs.’ Professionals within libraries, archives and museums have their own perspectives and values, and are often very caught up in their own long-standing practices. There may be good reason for this – often curators and archivists have had to fight over time to ensure their collections are properly looked after and catalogued. But a single search may call for a more compromised approach, and certainly it is likely to call for different thinking and finding new ways to represent the collections.

The ‘Technological Considerations’ section of the report is well worth reading, giving a short summary of some of the options. This is an area where the Archives Hub is very well aware of the pros and cons of different approaches. For an institution wanting to implement single search, there are a number of approaches: systems where you adopt batch export; systems where an API is used to pull the data in dynamically; a single system that replaces all the separate systems or multiple systems harvesting to a central repository; a federated search where each separate system is queried and results are brought back and presented to the user; a central index that is searched rather than the individual systems. All of these have pros and cons around things like flexibility, speed, currency and professional practices.

Of course, a further very important consideration will be digital assets, and the need to take a systematic approach here. Institutions may have Digital Asset Management Systems, but do these operate effectively with other collections sytems? Do digital assets exist in the different collections management systems? Are there shared metadata standards for digital assets?

Metadata Considerations present a whole new raft of challenges. I think that all to often those outside of the domains – maybe the managers who want to see single search and a more integrated approach – do not appreciate the substantial differences in approach between libraries, archives and museums. It is thought that because they all have something to do with that nebulous concept of ‘cultural heritage’ that they should all play together relatively easily. But each domain has built up its own world-view over many decades; the development of standards and best practice involves a great deal of hard work. It could be argued that finding ways to present catalogues or finding aids to users in a way that is as simple and straightforward as possible is not compatible with single search. It may be that single search, while seeking to provide an integrated approach, actually creates a more complex interface as a result of trying to integrate collection-based and hierarchical archival descriptions, item-based museum artifact descriptions and largely open access and usually non-unique library collections.

One of the biggest problems is that metadata is expensive to create. Automated metadata provides one solution but it is a very partial solution, especially for unique archival and musuem collections. Another challenge is that usually metadata has been created over long periods of time using a variety of systems, sometimes migrating from one system to another (often with patchy results). Metadata is messy, and yet standards lie at the heart of effective integration. But even standards are usually at different stages of evolution, and standards adopted by each of the domains do not necessarily harmonise very well.

One of the issues we have noticed on the Hub is the tendency for collections that are catalogued in great detail can overwhelm more summary descriptions. It can give the effect that those catalogued in more detail are more important. If you search on the Archives Hub relatively frequenly, you are likely to come across ‘University of Liverpool Staff Papers’ because they have been very thoroughly catalogued. There may be really good stuff in there, but should this one collection seem to be so much more important than so many others? Yet detailed cataloguing is surely a good thing?

There are also issues around vocabularies, and the tendency to implement multiple vocabularies within the same community. The Hub allows for any recognised vocabulary to be used for index temrsm but that does mean personal names, for example, entered using NCA Rules or AACR. You will inevitably end up with several different entries for the same name. The OCLC report refers to the need to harmonise metadata, trying to standardise terms, but I think that for us the way forward is generally to try to use the ever increasing sophistication of data processing tools to get round this problem, becuase we will never get 200 institutions to put things into the system in exactly the same way. Having said that, we are finding that as most of our contributors use our EAD Editor now, the descriptions are much more consistent and easier to integrate.

The OCLC report ends with some advice on the user interface, and one comment that I wholeheartedly agree with is the advice to hire or consult professional designers if you possibly can. Web presence is so important and Websites are often quite poorly designed. An ideal is to carry out user testing, but having done this ourselves, we know just how much time and effort it can take, and for many archives this is really quite a barrier. Even just testing with a small handful of users is very worthwhile. It’s amazing how much you find you have taken for granted that researchers will question. It’s good to see the importance of rights management emphasised and the need to clearly define access to content. This is becoming increasingly relevant, as data is republished, shared and recombined.

Single search is an important goal, not least because ‘the challenges inherent in this information divide ultimately expect researchers to compartmentalize their interests in a similar manner, rather than encouraging more multi-disciplinary approaches that focus on the research inqury (rather than the nature and custody of the resources).’ Our appoach has tended to suit our own professional outlooks; it should be geared towards what researchers want and need.

All quotes taken from Prescott, Leah and Ricky Erway. 2011. Single Search: The Quest for the Holy Grail. Dublin, Ohio: OCLC Research.

Image from www.digital-delight.ch

A Web of Possibilities

July 28, 2011 / Jane Stevenson

“Will you browse around my website”, said the spider to the fly,
‘Tis the most attractive website that you ever did spy”

All of us want to provide attractive websites for our users. Of course, we’d like to think its not really the spider/fly kind of relationship! But we want to entice and draw people in and often we will see our own website as our key web presence; a place for people to come to to find out about who we are, what we have and what we do and to look at our wares, so to speak.

The recently released ‘Discovery’ vision is to provide UK researchers with “easy, flexible and ongoing access to content and services through a collaborative, aggregated and integrated resource discovery and delivery framework which is comprehensive, open and sustainable.” Does this have any implications for the institutional or small-scale website, usually designed to provide access to the archives (or descriptions of archives) held at one particular location?

Over the years that I’ve been working in archives, announcements about new websites for searching the archives of a specific institution, or the outputs of a specific project have been commonplace. A website is one of the obvious outputs from time-bound projects, where the aim is often to catalogue, digitise or exhibit certain groups of archives held in particular repositories. These websites are often great sources of in-depth information about archives. Institutional websites are particularly useful when a researcher really wants to gain a detailed understanding of what a particular repository holds.

However, such sites can present a view that is based more around the provider of the information rather than the receiver. It could be argued that a researcher is less likely to want to use the archives because they are held at a particular location, apart from for reasons of convenience, and more likely to want archives around their subject area, and it is likely that the archives which are relevant to them will be held in a whole range of archives, museums and libraries (and elsewhere). By only looking at the archives held at a particular location, even if that location is a specialist repository that represents the researcher’s key subject area, the researcher may not think about what they might be missing.

Project-based websites may group together archives in ways that benefit researchers more obviously, because they are often aggregating around a specific subject area. For example, making available the descriptions and links to digital archives around a research topic. Value may be added through rich metadata, community engagement and functionality aimed at a particular audience. Sometimes the downside here is the sustainability angle: projects necessarily have a limited life-span, and archives do not. They are ever-changing and growing and descriptions need to be updated all the time.

So, what is the answer? Is this too much of a silo-type approach, creating a large number of websites, each dedicated to a small selection of archives?

Broader aggregation seems like one obvious answer. It allows for descriptions of archives (or other resources) to be brought together so that researchers have the benefit of searching across collections, bringing together archives by subject, place, person or event, regardless of where they are held (although there is going to be some kind of limit here, even if it is at the national level).

You might say that the Archives Hub is likely to be in favour of aggregation! But it’s definitely not all pros and no cons. Aggregations may offer a powerful search functionality for intellectually bringing together archives based on a researcher’s interests, but in some ways there is a greater risk around what is omitted. When searching a website that represents one repository, a researcher is more likely to understand that other archives may exist that are relevant to them. Aggregations tend to promote themselves as comprehensive – if not explicitly then implicitly – which this creates expectation that cannot ever fully be met. They can also raise issues around measuring impact and around licensing. There is also the risk of a proliferation of aggregation services, further confusing the resource discovery landscape.

Is the ideal of broad inter-disciplinary cross-searching going to be impeded if we compete to create different aggregations? Yes, maybe it will be to some extent, but I think that it is an inevitability, and it is valid for different gateways to service different audiences’ needs. It is important to acknowledge that researchers in different disciplines and at different levels have their own needs, their own specific requirements, and we cannot fulfill all of these needs by only presenting data in one way.

One thing I think is critical here is for all archive repositories to think about the benefits of employing recognised and widely-used standards, so that they can effectively interoperate and so that the data remains relevant and sustainable over time. This is the key to ensuring that data is agile, and can meet different needs by being used in different systems and contexts.

I do wonder if maybe there is a point at which aggregations become unwieldy, politically complicated and technically challenging. That point seems to be when they start to search across countries. I am still unsure about whether Europeana can overcome this kind of problem, although I can see why many people are so keen on making it work. But at present, it is extremely patchy, and , for example, getting no results for texts held in Britain relating to Shakespeare is not really a good result. But then, maybe the point is that Europeana is there for those that want to use it, and it is doing ground-breaking work in its focus on European culture; the Archives Hub exists for those interested in UK Archives and a more cross-disciplinary approach; Genesis exists for those interested in womens studies; for those interested in the Co-operative movement, there is the National Co-operative Archive site; for those researching film, the British Film Institute website and archive is of enormous value.

So, is the important principle here that diversity is good because people are diverse and have diverse needs? Probably so. But at the same time, we need to remember that to get this landscape, we need to encourage data sharing and avoid duplication of effort. Once you have created descriptions of your archive collections you should be able to put them onto your own website, contribute them to a project website, and provide them to an aggregator.

Ideally, we would be looking at one single store of descriptions, because as soon as you contribute to different systems, if they also store the data, you have version control issues. The ability to remotely search different data sources would seem to be the right solution here. However, there are substantial challenges. The Archives Hub has been designed to work in a distributed way, so that institutions can host their own data. The distributed searching does present challenges, but it certainly works pretty well. The problem is that running a server, operating system and software can actually be a challenge for institutions that do not have the requisite IT skills dedicated to the archives department. Institutions that hold their own data have it in a great variety of formats. So, what we really need is the ability for the Archives Hub to seamlessly search CALM, AdLib, MODES, ICA AtoM, Access, Excel, Word, etc. and bring back meaningful results. Hmmm….

The business case for opening up data seems clear. Project like Open Bibliographic Data have helped progress the thinking in this arena and raised issues and solutions around barriers such as licensing. But it seems clear that we need to understand more about the benefits of aggregation, and the different approaches to aggregation, and we need to get more buy-in for this kind of approach. Does aggregation allow users to do things that they could not do otherwise? Does it save them time? Does it promote innovation? Does it skew the landscape? Does it create problems for institutions because of the problems with branding and measuring impact? Furthermore, how can we actually measure these kinds of potential benefits and issues?

Websites that offer access to archives (or descriptions of archives) based on where they are located and based on they body that administers them have an important role to play. But it seems to me that it is vital that these archives are also represented on a more national, and even international stage. We need to bring our collections to where the users are. We need to ensure that Google and other search engines find our descriptions. We need to put archives at the heart of research, alongside other resources.

I remember once talking about the Archives Hub to an archivist who ran a specialist repository. She said that she didn’t think it was worth contributing to the Hub because they already had their own catalogue. That is, researchers could find what they wanted via the institute’s own catalogue on their own system, available in their reading room. She didn’t seem to be aware that this could only happen if they knew that the archive was there, and that this view rested on the idea that researchers would be happy to repeat that kind of search on a number of other systems. Archives are often about a whole wealth of different subjects – we all know how often there are unexpected and exciting finds. A specialist repository for any one discipline will have archives that reach way beyond that discipline into all sorts of fascinating areas.

It seems undeniable that data is going to become more open and that we should promote flexible access through a number of discovery routes, but this throws up challenges around version control, measuring impact, brand and identity. We always have to be cognisant of funding, and widely disseminated data does not always help us with a funding case because we lose control of the statistics around use and any kind of correlation between visits to our website and bums on seats. Maybe one of the challenges is therefore around persuading top-level managers and funders to look at this whole area with a new perspective?

The Standard Bearers

April 27, 2011 / Jane Stevenson

We generally like standards. Archivists, like many others within the information professions, see standards as a good thing. But if that is the case, and we follow descriptive standards, why aren’t our collection descriptions more interoperable? Why can’t users move seamlessly from one system to another and find them consistent?

I’ve been looking at a White Paper by Nick Poole of the Collections Trust: Where Next for Museum Standards? In this, he makes a good point about the reasons for using standards:

“Standards exist to condense and share the professional experience of our predecessors, to enable us to continue to build on their legacy of improvement.”

I think this point is sometimes overlooked – standards reflect the development of our understanding and expertise over time. As a novice jazz musician, I think this has a parallel with jazz theory – the point of theory is partly that it condenses what has been learnt about harmony, rhythm and melody over the past 100 years of jazz. The theory is only the means to the end, but without it acting effectively as a short cut, you would have to work your way through decades of musical development to get a good understanding of the genre.

Descriptive standards should be the means to the end – they should result in better metadata. Before the development of ISAD(G) for archives, we did not have an internationally recognised standard to help us describe archives in a largely consistent way (although ISAD(G) is not really a content standard). EAD has proved a vital addition to our range of standards, helping us to share descriptions far more effectively than we could do before.

But archives are diverse and maybe we have to accept that standards are not going to mould our descriptions so that they all come off of the conveyor belt of cataloguing looking the same? It may seem like something that would be of benefit to our users – descriptions that look pretty much identical apart from the actual content. But would it really suffice to reflect the reality of what archives are? Would it really suffice to reflect the reality of the huge range of users that there are?

Going back to Nick Poole’s paper, he says:

“The purpose of standards is not to homogenise, but to ensure that diversity is built on a solid foundation of shared knowledge and understanding and a collective commitment to quality and sustainability.”

I think this is absolutely right. However, I do sometimes wonder how solid this foundation is for archives, and how much our standards facilitate collaborative understanding. Standards need to be clearly presented and properly understood by those who are implementing them. From the perspective of the Hub, where we get contributions of data from 200 different institutions, standards are not always well understood. I’m not sure that people always think carefully about why they are using standards – this is just as important as applying the standards. It is only by understanding the purpose that I think you do come to a good sense of how to apply a standard properly. For example, we get some index terms that are ostensibly using NCA Rules (National Council on Archives Rules for Personal, Family and Place Names), but the entries are not always in line with the rules. We also get subject entries that do not conform to any thesauri, or maybe they conform to an in-house thesaurus, but for an aggregated service, this does not really help in one of the main aims of subject indexing – to pull descriptions together by subject.

Just as for museums, standards, as Nick Poole says, must be “communicated through publications, websites, events, seminars and training. They must be supported, through infrastructure and investment, and they must be enforced through custom, practice or even assessment and sanction.”

For the Hub, we have made one important change that has made descriptions much more standards compliant – we have invested in an ‘EAD Editor’; a template based tool for the creation and editing of EAD based archival descriptions. This sophisticated tool helps to ensure valid and standards-based descriptions. This idea of supporting standards through this kind of approach seems to me to be vital. It is hard for many archivists to invest in the time that it takes to really become expert in applying standards. For the Hub we are only dealing with descriptive standards, but archivists have many other competing standards to deal with, such as environmental and conservation standards. Software should have standards-compliance built in, but it should also be designed to meet the needs of the archivists and the users. This balance between standards and flexibility is tricky. But standards are not going to be effective if they don’t actually meet real life needs. I do sometimes think that standards suffer from being developed somewhat in isolation of practical reality – this can be a result of the funding environment, where people are paid to work on standards, and they don’t tend to be the people who implement them. Standards may also suffer from the perennial problem of a shifting landscape – standards that were clearly relevant when they were created may be rather less so 10 years on, but revising standards is a time-consuming process. The archives community has the NCA Rules, which have served their purpose very well, but they really need revising now, to bring them in line with the online, global environment.

In the UK Archives Discovery network (UKAD) we are working to help archivists understand and use standards effectively. We are going to provide an indexing tutorial and we are discussing ways to provide more guidance on cataloguing generally. The survey that we carried out in 2009 showed that archivists do want more guidance here. Whilst maybe there are some who are not willing to embrace standards, the vast majority can see the sense in interoperability, and just need a low-barrier way to improve their understanding of the standards that we have and how best to use them. But in the end, I can’t see that we will ever have homogeneous descriptions, so we need to harness technology in order to help us work more effectively with the diverse range of descriptions out there that reflect the huge diversity of archives and users.

Images: Flickr goosmurf’s photostream (dough cutter); robartesm’s photostream (standard bearer)