The Standard Bearers

We generally like stdough cutting andards. Archivists, like many others within the information professions, see standards as a good thing. But if that is the case, and we follow descriptive standards, why aren’t our collection descriptions more interoperable? Why can’t users move seamlessly from one system to another and find them consistent?

I’ve been looking at a White Paper by Nick Poole of the Collections Trust: Where Next for Museum Standards? In this, he makes a good point about the reasons for using standards:

“Standards exist to condense and share the professional experience of our predecessors, to enable us to continue to build on their legacy of improvement.”

I think this point is sometimes overlooked – standards reflect the development of our understanding and expertise over time. As a novice jazz musician, I think this has a parallel with jazz theory – the point of theory is partly that it condenses what has been learnt about harmony, rhythm and melody over the past 100 years of jazz. The theory is only the means to the end, but without it acting effectively as a short cut, you would have to work your way through decades of musical development to get a good understanding of the genre.

Descriptive standards should be the means to the end – they should result in better metadata. Before the development of ISAD(G) for archives, we did not have an internationally recognised standard to help us describe archives in a largely consistent way (although ISAD(G) is not really a content standard). EAD has proved a vital addition to our range of standards, helping us to share descriptions far more effectively than we could do before.

But archives are diverse and maybe we have to accept that standards are not going to mould our descriptions so that they all come off of the conveyor belt of cataloguing looking the same? It may seem like something that would be of benefit to our users – descriptions that look pretty much identical apart from the actual content. But would it really suffice to reflect the reality of what archives are? Would it really suffice to reflect the reality of the huge range of users that there are?

Going back to Nick Poole’s paper, he says:

“The purpose of standards is not to homogenise, but to ensure that diversity is built on a solid foundation of shared knowledge and understanding and a collective commitment to quality and sustainability.”

I think this is absostatue of toy standard bearerlutely right. However, I do sometimes wonder how solid this foundation is for archives, and how much our standards facilitate collaborative understanding. Standards need to be clearly presented and properly understood by those who are implementing them. From the perspective of the Hub, where we get contributions of data from 200 different institutions, standards are not always well understood. I’m not sure that people always think carefully about why they are using standards – this is just as important as applying the standards. It is only by understanding the purpose that I think you do come to a good sense of how to apply a standard properly. For example, we get some index terms that are ostensibly using NCA Rules (National Council on Archives Rules for Personal, Family and Place Names), but the entries are not always in line with the rules. We also get subject entries that do not conform to any thesauri, or maybe they conform to an in-house thesaurus, but for an aggregated service, this does not really help in one of the main aims of subject indexing – to pull descriptions together by subject.

Just as for museums, standards, as Nick Poole says, must be “communicated through publications, websites, events, seminars and training. They must be supported, through infrastructure and investment, and they must be enforced through custom, practice or even assessment and sanction.”

For the Hub, we have made one important change that has made descriptions much more standards compliant – we have invested in an ‘EAD Editor’; a template based tool for the creation and editing of EAD based archival descriptions. This sophisticated tool helps to ensure valid and standards-based descriptions. This idea of supporting standards through this kind of approach seems to me to be vital. It is hard for many archivists to invest in the time that it takes to really become expert in applying standards. For the Hub we are only dealing with descriptive standards, but archivists have many other competing standards to deal with, such as environmental and conservation standards. Software should have standards-compliance built in, but it should also be designed to meet the needs of the archivists and the users. This balance between standards and flexibility is tricky. But standards are not going to be effective if they don’t actually meet real life needs. I do sometimes think that standards suffer from being developed somewhat in isolation of practical reality – this can be a result of the funding environment, where people are paid to work on standards, and they don’t tend to be the people who implement them. Standards may also suffer from the perennial problem of a shifting landscape – standards that were clearly relevant when they were created may be rather less so 10 years on, but revising standards is a time-consuming process. The archives community has the NCA Rules, which have served their purpose very well, but they really need revising now, to bring them in line with the online, global environment.

In the UK Archives Discovery network (UKAD) we are working to help archivists understand and use standards effectively. We are going to provide an indexing tutorial and we are discussing ways to provide more guidance on cataloguing generally. The survey that we carried out in 2009 showed that archivists do want more guidance here. Whilst maybe there are some who are not willing to embrace standards, the vast majority can see the sense in interoperability, and just need a low-barrier way to improve their understanding of the standards that we have and how best to use them. But in the end, I can’t see that we will ever have homogeneous descriptions, so we need to harness technology in order to help us work more effectively with the diverse range of descriptions out there that reflect the huge diversity of archives and users.

Images: Flickr goosmurf’s photostream (dough cutter); robartesm’s photostream (standard bearer)

The long tail of archives

For many of us, the importance of measuring use and impact are coming more to the fore. Funders are often keen for indications of the ‘value’ of archives and typically look for charts and graphs that can provide some kind of summary of users’ interaction with archives. For the Hub, in the most direct sense this is about use of the descriptions of archives, although, of course, we are just as interested in whether researchers go on to consult archives directly.

The pattern of use of archives and the implications of this are complex. The long tail has become a phrase that is banded around quite a bit, and to my mind it is one of those concepts that is quite useful. It was popularised by Chris Anderson, more in relation to the commercial world, relating to selling a smaller number of items in large quantities and a large number of items in relatively small quantities, and you can read more about it in Wikipedia: Long Tail.

If we think about books, we might assume that a smaller number of popular titles are widely used and use gradually declines until you reach a long tail of low use.  We might think that the pattern, very broadly speaking, is a bit like this:

I attended a talk at the UKSG Conference recently, where Terry Bucknell from the University of Liverpool was talking about the purchase of e-books for the University. He had some very whizzy and really quite absorbing statistics that analysed the use of packages of e-books. It seems that it is hard to predict use and that whilst a new package of e-books is the most widely used for that particular year, the older packages are still significantly used, and indeed, some books that are barely used one year may be get significant use in subsequent years. The patterns of use suggested that patron-driven acquisition, or selection of titles after one year of use, were not as good value as e-book packages, although you cannot accurately measure the return on investment after only one year.

Archives are kind of like this only a whole lot more tricky to deal with.

For archives, my feeling is that the graph is more like this:

No prizes for guessing which are the vastly more used collections*. We have highly used collections for popular research activities, archives of high-profile people and archives around significant events, and it is often these that are digitised in order to protect the originals.  But it is true to say that a large proportion of archives are in the ‘long tail’ of use.

I think this can be a problem for us. Use statistics can dominate perceptions of value and influence funding, often very profoundly. Yet I think that this is completely the wrong way to look at it. Direct use does not correlate to value, not within archives.

I think there are a number of factors at work here:

  • The use of archives is intimately bound up with how they are catalogued. If you have a collection of letters, and just describe it thus, maybe with the main author (or archival ‘creator’), and covering dates, then researchers will not know that there are letters by a number of very interesting people, about a whole range of subjects of great interest for all sorts of topics. Often, archivists don’t have the time to create rich metadata (I remember the frustrations of this lack of time). Having worked in the British Architectural Library, I remember that we had great stuff for social history, history of empire, in particular the Raj in India, urban planning, environment, even the history of kitchen design or local food and diet habits. We also had a wonderful collection of photographs, and I recall the Photographs Curator showing me some really early and beautiful photographs of Central Park in New York. Its these kind of surprises that are the stuff of archives, but we don’t often have time to bring these out in the cataoguing process.
  • The use of a particular archive collection may be low, and yet the value gained from the insights may be very substantial. Knowledge gained as a result of research in the archives may feed into one author’s book or article, and from there it may disseminate widely. So, one use of one archive may have high value over time. If you fed this kind of benefit in as indirect use, the pattern would look very different.
  • The ‘value’ of archives may change over time. Going back to my experience at the British Architectural Library, I remember being told how the drawings of Sir Edwin Lutyens were not considered particularly valuable back in the 1950s – he wasn’t very fashionable after his death. Yet now he is recognised as a truly great architect, and his archives and drawings are highly prized.
  • The use of archives may change over time. Just because an archive has not been used for some time – maybe only a couple of researchers have accessed it in a number of years – it doesn’t mean that it won’t become much more heavily used. I think that research, just like many things, is subject to fashions to some extent, and how we choose to look back at our past changes over time. This is one of the challenges for archivists in terms of acquisitions. What is required is a long-term perspective but organisations all too often operate within short-term perspectives.
  • Some archives may never be highly used, maybe due to various difficulties interpreting them. I suppose Latin manuscripts come to mind, but also other manuscripts that are very hard to read and those pesky letters that are cross-written. Also, some things are specialised and require professional or some kind of expert knowledge in order to understand them. This does not make them less valuable. It’s easy to think of examples of great and vital works of our history that are not easy for most people to read or interpret, but that are hugely important.
  • Some archives are very fragile, and therefore use has to be limited. Digitising may be one option, but this is costly, and there are a lot of fragile archives out there.

I’m sure I could think of some more – any thoughts on this are very welcome!

So, I think that it’s important for archivists to demonstrate that whilst there may be a long tail to archives, the value of many of those archives that are not highly used can be very substantial. I realise that this is not an easy task, but we do have one invention in our favour: The Web. Not to mention the standards that we have built up over time to help us to describe our content. The long tail graph does demonstrate to us that the ‘long tail of use’ can be just as much, or more, than the ‘high column of use’. The use of the Web is vital in making this into a reality, because researchers all over the world can discover archives that were previously extremely hard to surface.  That does still leave the problems of not being able to catalogue in depth in order to help surface content…the experiments with crowd-sourcing and user generated content may prove to be one answer. I’d like to see a study of this – have the experiments with asking researchers to help us catalogue our content proved successful if we take a broad overview? I’ve seen some feedback on individual projects, such as OldWeather:

“Old Weather (http://www.oldweather.org) is now more than 50% complete, with more than 400,000 pages transcribed and 80 ships’ logs finished. This is all thanks to the incredible effort that you have all put in. The science and history teams are constantly amazed at the work you’re all doing.” (a recent email sent out to the contributors, or ‘ship captains’).

If anyone has any thoughts or stories about demonstrating value, we’d love to hear your views.

* family history sources