Training and the Archives Hub.

A couple of weeks ago I took part in a training session for postgraduate students from the English department at the University of Salford. This had been organised with Ian Johnston, University Archivist at Salford, and Professor Sharon Ruston from ESPaCH. (School of English, Sociology, Politics & Contemporary History)

Training Room

Sharon kicked off the session by explaining what archives mean to her career and how she had actually made her name and written a book on the strength of some new evidence that she uncovered about Shelley and his desire to be a doctor: Shelley and Vitality (Palgrave Macmillan, 2005), which explored the medical and scientific contexts which inform Shelley’s concept of vitality in his major poetry.

She went on to detail some of her new research on Humphry Davy (examining poetry & science) and explained that although it can often be a lot of effort to look for archives, it can pay dividends if you put the time and energy into searching.

Ian then took the floor and showed the students some of the hidden gems from the University’s archives. He also brought some items with him – a letter from Edith Sitwell, papers from the Duke of Bridgewater archive etc. He also showed some photos of Salford University in the 1970s. We were all fairly amazed by the picture of the paternoster lift, which is a lift that doesn’t stop. Literally you have to jump on as it’s going past. Talk about students living dangerously!

Ian explained why Salford University contributed to the Hub: the benefits of profile in being part of a national cross-searching service leading to more researchers benefitting from the Salford University Archives Collections.

I then did a demonstration of some different websites where you can search for archives online and went on to show how the Archives Hub, Copac and Zetoc work and the different types of information that you can find in each.

Prior to the session, Ian and Sharon had asked the students for their research areas and I used these as my examples. I find if students cannot easily see how and why something is relevant to them, then they switch off. It’s important to tailor your examples to your audience, whatever level they are studying at.

We then got the students to have a go themselves as we walked around the room and gave more individual help. This worked really well as each student got at least 5 or 10 mins of one-to-one help on searching for their particular subject area.

We were all really pleased with how the session went. I could actually see the students sit up and take notice when Sharon was talking about making her name from finding new knowledge. It underlined how primary source material can lead to students incorporating unique perspectives to their research. I feel that this was key to the success of the session. The students were able to see how important archives had been to someone who they respected and knew was an expert in her field.

Ian showed them actual papers and letters from the archive and this allowed them to see concrete examples of what we were talking about, as opposed to thinking about archive materials in an abstract and ‘virtual’ way by just looking at online finding aids.

Sharon and Ian did a great job of explaining the benefits of using archives, I just told them how to find stuff… It was great to see how engaged the students were with what we were explaining to them. So much so I’ve been asked back for a repeat performance. (With the academics!)

UKAD Forum

The National Archives
The National Archives (used under a CC licence from http://www.flickr.com/photos/that_james/2693236972/)

Weds 2nd March was the inaugural event of the UK Archives Discovery Network – better known as UKAD.  Held at the National Archives, the UKAD Forum was a chance for archive practitioners to get together, share ideas, and hear about interesting new projects.

The day was organised into 3 tracks: A key themes for information discovery; B standards and crowdsourcing; and C demonstrating sites and systems.  Plenary sessions came from John Sheridan of TNA, Richard Wallis of Talis, David Flanders of Jisc, and Teresa Doherty of the Women’s Library.

I would normally have been tweeting away, but unfortunately although I could connect to the wifi, I couldn’t get any further!  So here are my edited highlights of the day (also known as ‘tweets I wish I could have sent’).

Richard Sheridan kicked off the proceedings by talking about open data.  The government’s Coalition Agreement contains a commitment to open data, which obviously affects The National Archives, as repository for government data.  They are using light-weight existing Linked Data vocabularies, and then specialising them for their needs. I was particularly interested to hear about the particular challenges posed by legislation.gov.uk, explained by John as ‘A changes B when C says so’: new legislation may alter existing legislation, and these changes might come into force at a time specified by a third piece of legislation…

Richard Wallis carried on the open data theme, by talking about Linked Data and Linked Open Data. His big prediction? That the impact of Linked Data will be greater than the impact of the World Wide Web it builds on. A potentially controversial statement, delivered with a very nice slide deck.

Off to the tracks, and I headed for track B to hear Victoria Peters from Strathclyde talk about ICA-AtoM.  This is open source, web based archival  description software, aimed at archivists and institutions with limited financial and technical resources.  It looks rather nifty, and supports EAD and EAC import and export, as well as digital objects.  If you want to try it out, you can download a demo from the ICA-AtoM website, or have a look at Strathclyde’s installation.

Bill Stockting from the BL gave us an update on EAD and EAC-CPF.  I’m just starting to learn about EAC-CPF, so it was interesting to hear the plans for it.  One of Bill’s main points was that they’re trying to move beyond purely archival concerns, and are hoping that EAC-CPF can be used in other domains, such as MARC.  This is an interesting development, and I hope to hear more about it in the future!  Bill also mentioned SNAC, the Social Networks and Archival Context project, which is looking at using EAC-CPF with a number of tools (including VIAF) to ‘to “unlock” descriptions of people from finding aids and link them together in exciting new ways’.

David Flanders’ post-lunch plenary provided absolutely my favourite moment of the day: David said ‘Technology will fail if not supported by the users’… and then, with perfect timing, the projector turned off.  One of David’s key points was that ‘you are not your users’.  You can’t be both expert and user, and you will never know exactly how what users want from your systems, and how they will use them unless you actually ask them! Get users involved in your projects and bids, and you’re likely to be much more successful.

Alexandra Eveleigh spoke in track B about ‘crowds and communities: user participation in the archives’.  I especially liked her distinction between ‘crowds’ and ‘communities’ – crowds are likely to be larger, and quickly dip in and out, while communities are likely to be smaller overall, but dedicate more time and effort.  She also pointed out that getting users involved isn’t a new thing – there’s always been a place in archives for those pursuing ‘serious leisure’, and bringing their own specialist knowledge and experience.  A point Alexandra made that I found particularly interesting was that of being fair to your users – don’t ask them to participate and help you, if you’re not going to listen to their opinions!

I have to admit that I’d never really heard of Historypin before I saw them on the conference programme.  Don’t click on that link if you have anything you need to get done today!  Historypin takes old photographs, and ‘pins’ them to their exact geographic location using Google maps.  You can see them in streetview, overlaid on the modern background, and it is absolutely fascinating.  Photos can be contributed by anyone, and anyone can add stories or more information to photos on the site.  One of the developments on the way is the ability to ‘pin’ video and audio clips in the same way.

CEO Nick Stanhope was keen to point out that Historypin is a not-for-profit – they’re in partnership with Google, but not owned by them, and they don’t ask for any rights to any of the material posted on Historypin.  They’re keen to work with archives to add their photographic collections, and have a couple of things they hope to soon be able to offer archives in return (as well as increased exposure!):  they’ll be allowing any archive to have an instance of Historypin embedded on the archive’s site for free.  They’re also developing a smartphone app, and will be offering any archive their own branded version of the app – for free!  These developments sound really exciting, and I hope we hear more from them soon.

Teresa Doherty’s closing plenary was on the re-launch of the Genesis project.  As Teresa said ‘many of you will be sitting there thinking ‘this isn’t plenary material! what’s going on?”, but Teresa definitely made it a plenary worth attending.  Genesis is a project which allows users to cross-search women’s studies resources from museums, libraries and archives in the UK, and Teresa made the persuasive point that while the project itself might not be revolutionary, how they’ve done it is.  Genesis has had no funding since 200 – everything they’ve done since then, including the relaunch, has been done with only the in-house resources they have available.  They’ve used SRU to search the Archives Hub, and managed to put together a valuable service with minimal resources.

As a librarian and a new professional, I found Teresa’s insights into the history of archival cataloguing particularly fascinating.  I knew that ISAD(G) was released in 1996, but I hadn’t had any real understanding of what that meant: that before 1996, there were no standards or guidelines for archival cataloguing. Each institution would catalogue in entirely their way – a revelation to me, and completely alien to my entirely standards-based professional background!  And I now have a new mantra, learned from one of Teresa’s old managers back in the early 90s:

‘We may not have a database now, but if we have structured data then one day we will have a database to put it in!’

I don’t think I’ve ever heard a better definition of the interoperability mindset.

After the day officially ended, it was off the the pub for a swift pint and wind-down. An excellent, instructive, and fun day.

Slides from the day are available on SlideShare – tag ukad.

New Horizons

The Horizon Report is an excellent way to get a sense of emerging and developing technologies, and it is worth thinking about what they might mean for archives. In this post I concentrate on the key trends that are featured for the next 1-4 years.

Electronic Books

“[E]lectronic books are beginning to demonstrate capabilities that challenge the very definition of reading.”

Electronic books promise not just convenience, but also new ways of thinking about reading. They encourage interactive, social and collaborative approaches. Does this have any implications for archives? Most archives are paper-based and do not lend themselves so well to this kind of approach. We think of consulting archives as a lone pursuit, in a reading room under carefully controlled conditions. The report refers to “a dynamic journey that changes every time it is opened.” An appealing thought, and indeed we might feel that archives also offer this kind of journey. Increasingly we have digital and born-digital archives, but could these form part of a more collaborative and interactive way of learning? Issues of authenticity, integrity and intellectual property may mitigate against this.

Whilst we may find it hard to see how archives may not become a part of this world – we are talking about archives, after all, and not published works – there may still be implications around the ways that people start to think about reading. Will students become hooked on rich and visual interfaces and collaborative opportunities that simply do not exist with archives?

Mobiles

“According to a recent report from mobile manufacturer Ericsson, studies show that by 2015, 80% of people accessing the Internet will be doing so from mobile devices.”

Mobiles are a major part of the portable society. Archive repositories can benefit from this, ensuring that people can always browse their holdings, wherever they are. We need to be involved in mobile innovation. As the report states: “Cultural heritage organizations and museums are also turning to mobiles to educate and connect with audiences.” We should surely see mobiles as an opportunity, not a problem for us, as we increasingly seek to broaden our user-base and connect with other domains. Take a look at the ‘100 most educational iPhone Apps‘. They include a search of US historical documents with highlighting and the ability to add notes.

Augmented Reality

We have tended to think of augmented reality as something suitable for marketing, social engagement and amuseument. But it is starting to provide new opportunities for learning and changing expectations around access to information. This could provide opportunities for archives to engage with users in new ways, providing a more visual experience. Could it provide a means to help people understand what archives are all about? Stanford University in the US has created an island in Second Life. The unique content that the archives provide was seen as something that could draw visitors back and showcase the extensive resources available. Furthermore, they created a ‘virtual archives’, giving researchers an opportunity to explore the strong rooms, discover and use collections and collaborate in real time.

The main issue around using these kinds of tools is going to be the lack of skills and resources. But we may still have a conflict of opinions over whether virtual reality really has a place in ‘serious research’. Does it trivialize archives and research? Or does it provide one means to engage younger potential users of archives in a way that is dynamic and entertaining? I think that it is a very positive thing if used appropriately. The Horizon Report refers to several examples of its use in cultural heritage: the Getty Museum are providing ‘access’ to a 17th century collector’s cabinet of wonders; the Natural History Museum in London are using it in an interactive video about dinosaurs; the Museum of London are using it to allow people to view 3D historical images overlaid on contemporary buildings. Another example is the Powerhouse Museum in Sydney, using AR to show the environment around the Museum 100 years ago. In fact, AR does seem to lend itself particularly well to teaching people about the history around them.

Game-Based Learning

Another example of blending entertainment with learning, games are becoming increasingly popular in higher education, and the Serious Games movement is an indication of how far we have come from the notion that games are simply superficial entertainment. “[R]esearch shows that players readily connect with learning material when doing so will help them achieve personally meaningful goals.” For archives, which are often poorly understood by people, I think that gaming may be one possible means to explain what archives are, how to navigate through them and find what may be of interest, and how to use them. How about something a bit like this Smithsonian initiative, Ghosts of a Chance, but for archives?

These technologies offer new ways of learning, but they also suggest that our whole approach to learning is changing. As archivists, we need to think about how this might impact upon us and how we can use it to our advantage. Archives are all about society, identity and story. Surely, therefore, these technologies should give us opportunities to show just how much they are a part of our life experiences.

Voices for the Library

Voices for the Library is a place for anyone who loves and values libraries to share their experiences and stories about what libraries mean to them.  Also known as VftL, or simply ‘Voices’, the campaign was set up in September 2010 by a group of information professionals who were concerned about the negative and inaccurate coverage of libraries in the media.

The group felt that public libraries were being misrepresented in the media, for instance by their insistence on using footfall as the only measure of library use, ignoring all online services and interactions.  Voices started out as a way to combat this, to provide accurate information, and to share stories of what libraries mean to people.   Much of our content comes from library users, who want to share their stories about how libraries have affected their lives.

And of, course, there are stories from librarians as well.  Some are examples of the kind of work they do, to show the range and depth of what trained library staff do, and to illustrate that it’s not all stamping books and shushing!  And some are more theoretical debates, about the philosophy of public libraries.

Recently, we’ve started to look into the impact these closure might have on archives and special collections.  This was prompted by a blog post from Alison Cullingford, and campaigners are starting to look at what might happen to archive services in their region, as VftL member Lauren has done for Doncaster.

As more closures and cutbacks are threatened, the VftL team have been working overtime.  We’re all volunteers, and do Voices work on top of our day jobs, other professional involvement, continuing education – oh, and real lives!  We’re also scattered across the country, from Brighton to Harrogate, and all points between.  This means that the entire campaign so far has been co-ordinated virtually, using email and various other social media tools.  Most of the team had never even met each other.

Until Wednesday 26 Jan, that is!  Thanks to sponsorship from Credo Reference we were able to get most of the team down to London for a proper face-to-face board meeting, which I chaired.  I’ve never chaired a real meeting before, and I have to thank the Voices team for making it incredibly easy!  We only ran an hour over time, and managed to discuss and make decisions on several key points.   I think it definitely ranks as the best all-day meeting I’ve ever attended.

One of the things that hasn’t changed is that we’re always on the lookout for stories about the value of public library services, and why they are so important to people.  If you’d like to share your story, or tell us more about what’s going on in your area, you can contact us at stories@voicesforthelibrary.org.uk.

A bit about Resource Discovery

The UK Archives Discovery Network (UKAD) recently advertised our up and coming Forum on the archives-nra listserv. This prompted one response to ask whether ‘resource discovery’ is what we now call cataloguing and getting the catalogues online. The respondent went on to ask why we feel it necessary to change the terminology of what we do, and labelled the term resource discovery as ‘gobledegook’. My first reaction to this was one of surprise, as I see it as a pretty plain talking way of describing the location and retrieval of information , but then I thought that it’s always worth considering how people react and what leads them to take a different perspective.

It made me think that even within a fairly small community, which archivists are, we can exist in very different worlds and have very different experiences and understanding. To me, ‘resource discovery’ is a given; it is not in any way an obscure term or a novel concept. But I now work in a very different environment from when I was an archivist looking after physical collections, and maybe that gives me a particular perspective. Being manager of the Archives Hub, I have found that a significant amount of time has to be dedicated to learning new things and absorbing new terminology. There seem to be learning curves all over the place, some little and some big. Learning curves around understanding how our Hub software (Cheshire) processes descriptions, Encoded Archival Description , deciding whether to move to the EAD schema, understanding namespaces, search engine optimisation, sitemaps, application programming interfaces, character encoding, stylesheets, log reports, ways to measure impact, machine-to-machine interfaces, scripts for automated data processing, linked data and the semantic web, etc. A great deal of this is about the use of technology, and figuring out how much you need to know about technology in order to use it to maximum effect. It is often a challenge, and our current Linked Data project, Locah, is very much a case in point (see the Locah blog). Of course, it is true that terminology can sometimes get in the way of understanding, and indeed, defining and having a common understanding of terms is often itself a challenge.

My expectation is that there will always be new standards, concepts and innovations to wrestle with, try to understand, integrate or exclude, accept or reject, on pretty much a daily basis. When I was the archivist at the RIBA (Royal Institute of British Architects), back in the 1990’s, my world centered much more around solid realities: around storerooms, temperature and humidity, acquisitions, appraisal, cataloguing, searchrooms and the never ending need for more space and more resources. I certainly had to learn new things, but I also had to spend far more time than I do now on routine or familiar tasks; very important, worthwhile tasks, but still largely familiar and centered around the institution that I worked for and the concepts terminology commonly used by archivists. If someone had asked me what resource discovery meant back then, I’m not sure how I would have responded. I think I would have said that it was to do with cataloguing, and I would have recognised the importance of consistency in cataloguing. I might have mentioned our Website, but only in as far as it provided access through to our database. The issues around cross-searching were still very new and ideas around usability and accessibility were yet to develop.

Now, I think about resource discovery a great deal, because I see it as part of my job to think of how to best represent the contributors who put time and effort into creating descriptions for the Hub. To use another increasingly pervasive term, I want to make the data that we have ‘work harder’. For me, catalogues that are available within repositories are just the beginning of the process. That’s fine if you have researchers who know that they are interested in your particular collections. But we need to think much more broadly about our potential global market: all the people out there who don’t know they are interested in archives – some, even, who don’t really know what archives are. To reach them, we have to think beyond individual repositories and we have to see things from the perspective of the researcher. How can we integrate our descriptions into the ‘global information environment’ in a much more effective way. A most basic step here, for example, is to think about search engine optimisation. Exposing archival descriptions through Google, and other search engines, has to be one very effective way to bring in new researchers. But it is not a straightforward exercise – books are written about SEO and experts charge for their services in helping optimise data for the Web. For the Archives Hub, we were lucky enough to be part of an exercise looking at SEO and how to improve it for our site. We are still (pretty much as I write) working on exposing our actual descriptions more effectively.

Linked Data provides another whole world of unfamiliar terminology to get your head round. Entities, triples, URI patterns, data models, concepts and real world things, sparql queries, vocabularies – the learning curve has indeed been steep. Working on outputting our data as RDF (a modelling framework for Linked Data) has made me think again about our approach to cataloguing and cataoguing standards. At the Hub, we’re always on about standards and interoperability, and it’s when you come to something like Linked Data, where there are exciting possibilities for all sorts of data connections, well beyond just the archive community, that you start to wish that archivists catalogued far more consistently. If only we had consistent ‘extent’ data, for example, we could look at developing a lovely map-based visualisation showing where there are archives based on specific subjects all around the country and have a sense of where there are more collections and where there are fewer collections. If only we had consistent entries for people’s names, we could do the same sort of thing here, but even with thesauri, we often have more than one name entry for the same person. I sometimes think that cataloguing is more of an art than a science, partly because it is nigh on impossible to know what the future will bring, and therefore knowing how to catalogue to make the most of as yet unknown technologies is tricky to say the least. But also, even within the environment we now have, archivists do not always fully appreciate the global and digital environment which requires new ways of thinking about description. Which brings me back to the idea of whether resource discovery is another term for cataloguing and getting catalogues online. No, it is not. It is about the user perspective, about how researchers locate resources and how we can improve that experience. It has increasingly become identified with the Web as a way to define the fundamental elements of the Web: objects that are available and can be accessed through the Internet, in fact, any concept that has an identity expressed as a URI. Yes, cataloguing is key to archives discovery, cataloguing to recognised standards is vital, and getting catalogued online in your own particular system is great…but there is so much more to the whole subject of enabling researchers to find, understand and use archives and integrating archives into the global world of resources available via the Web.

Digital Curation: think use, not preservation

For the keynote presentation at the DCC/RIN Research Data Management Forum on ‘The Economics of Applying and Sustaining Digital Curation’, Chris Rusbridge gave us some reflections from the Blue Ribbon Task Force (BRTF): http://brtf.sdsc.edu/about.html on Sustainable Digital Preservation and Access. This was a 2 year project, finishing earlier this year, and the final report is available from: http://brtf.sdsc.edu/biblio/BRTF_Final_Report.pdfpicture of digital data

Chris kicked off by asking us to think about how we currently support access to digital information. Avenues include Government grants, advertisements (e.g. through Google), subscriptions (to journals), pay per service (e.g. Amazon Web service), and donations.

One of the key themes that he raised and returned to was around the alignment, or lack of alignment between those who pay, those who provide and those who benefit from digital data: they are not necessarily the same, and the more different they are the harder it may be to create a sustainable model . Who owns, who benefits, who selects, who preserves, who pays?  This has interesting parallels with archive repositories, where an institution may pay for the acquisition, appraisal, storage, cataloguing and access for these resources, but the beneficiaries are far broader than just members of the institution. Some institutions may require payment for access, but others will provide access free of charge. They may see this as a means to enhance their reputation and status as a learned society.

Around 15 years ago we started to think about digital preservation as a technical problem and then the OAIS reference model was produced. The technical capabilities that we now have are well up to the task, although Chris warned that the most elegant technical solution is no good if it is not sustainable; digital preservation has to be a sustainable economic activity. Today the focus is on the economic and organisational problems. It is not just about money; it requires building upon a value proposition, providing incentives to act and defining roles and responsibilities.

Digital preservation represents a derived demand.  No one ‘wants’ preservation per se; what they want is access to a resource.  It is not easy to sell a derived demand – often it needs to be sold on some other  basis. This idea of selling the importance of providing use (over time) rather than trying to sell the idea of preservation was emphasised throughout the Forum.

Digital preservation is also ‘path dependent’, meaning that the actions and decision you take change over time; they are different at different points of the life-cycle. Today’s actions can remove other options for all time.

Cultural issues, and mindset may be an issue here, and I was interested in the potential problem Chris proposed of  the ‘free-rider’ culture when it comes to making research datasets available. It may be that some (many?) researchers don’t want to pay for things, under value services and maybe underestimate costs. Researchers may also resent conformity and what they see as beauracracy. All in all, it may be difficult to make a case that researchers should in some way pay. This may be compounded by a sense that money invested in preservation is money taken out of research.  Chris suggested that the incentives for preservation are less apparent to the individual researcher, but are more clearly defined when the data is aggregated.

Typically, long-term preservation activities  have been funded by short-term resource allocation, although maybe this is gradually changing; a more thorny issue is that of recognising and valuing the benefits of digital preservation, to provide incentives that attract funding. More work needs to be done on articulating the benefits in order to cultivate a sense of the value.However, other speakers at the Forum wondered whether we should actually take the value as a given – maybe we shouldn’t keep asking the question about benefits, but simply acknowledge that it is the right thing to make research and other digital outputs available long-term?  We may be creating problems for ourselves if we emphasise the need to demonstrate value too much, and then struggle to quantify the value. However, this was just one argument, and overall I think that there was a belief that we do need to understand and articulate the benefits of providing long-term access.

There is often a lack of clear responsibility around digital preservation – maybe this is one of those areas where it’s always thought to be someone else’s responsibility? So, appropriate organisation and governance is essential for efficient ongoing preservation, especially when considering the tendency for data to be transferred – these ‘handoffs’ need to be secure.

The three imperatives that the BRTF report comes up with are: to articulate a compelling value proposition; to provide clear incentives to preserve in the public interest; to define role and responsibilities.

Commenting briefly on the post BRTF developments, Chris mentioned the EU digital agenda and the  LIBER pan-european survey on sustainability preparedness.

There are some mandates emerging:  the NERC and ESRC, for example.  Some publishers do require authors to make available data that substantiates an article, but at present this is not rigorous enough. We need to focus more on the data behind the research and how important it is.

Chris contrasted domain data repositories and institutional data repositories. Domain data repositories: leverage scale and expertise; are valuable for ‘high curation’ data; can carry out a ‘community proxy’ role such as tool development; aggregate demand; are potentially vulnerable to policy change (e.g. AHDS). A mixed funding models desirable for domain data repositories (e.g. ICPSR). Institutional data repositories: have a reputational business case (risk management, records management aspects, showcasing); should be aligned with institutional goals; can link to institutional research services (e.g. universal backup); can work well for ‘low curation’ cases (relatively small, static datasets); demand aggregation across a set of disciplines.

One issue that came up in the discussion was that we must remember that in fact digital preservation is relatively cheap, especially when compared to the preservation of hard-copy archives, held in acid-free boxes on rows and rows of shelving in secure, controlled search rooms.  So, if the cost is actually not prohibitive, and the technical know-how is there, then it seems imperative to address the organisational issues and to really hammer home the true value of preserving our digital data.

Opening the door to demonstrating value

The Archives Hub team value the links that we have with our contributors, who, after all, make the Hub what it is. We have a Contributors’ Forum in order to establish and develop links with contributors and get their feedback on Hub developments.

photo of open doorThis week we ran a Contributors’ Forum that concentrated on measuring impact, something that is becoming increasingly important in order to demonstrate value.  Unfortunately, we ended up with quite a small group, despite sending out some enticing emails – maybe a sign of the difficult times. But we still had a stimulating discussion, and for us it is always very valuable to get a perspective from the actual archives repositories.

We spent the first part of the morning with updates on the Hub and reports from the contributors: John Rylands at Manchester, Salford, Liverpool and Glasgow. Joy then gave a presentation on measuring impact, reflecting on some work that the Archives Hub, Copac and Zetoc services have carried out through online surveys and one-to-one interviews with researchers in order to create case studies.

In the afternoon we concentrated on measuring impact by asking the contributors to think about (i) what sort of information they currently collect about their researchers and (ii) what sort of information they would like to have. Overall, it seems that most archives have some form of registration, where researchers give some details about themselves. But the information recorded varies, not surprisingly. Sometimes information such as the items consulted is given, sometimes researchers are asked to specify their subject area, and at Glasgow they are asked how they found out about the University Archive. At Liverpool all  of the requisition slips are studiously kept, so that there is a record of who has looked at what, and at Glasgow there is a log of everything leaving the strong room, and I’m sure that for most archives this is the case.  At Salford, phone and email enquiries are all logged, as well as website statistics kept.

However, it seems that in general there is very little information on what happens next. How does the visit to the archive benefit the researcher? Do they use what they have found in publications? reports? articles? The archive repository may find this sort of information out if the researcher asks about copyright issues, but otherwise it is very hard to know. We agreed that informal networks can be valuable here. Archivists often get to know regular researchers, and in fact, this may be more likely to happen at smaller repositories where there is a lone archivist. But this can only account for a small part of the use of the collections. In fact, two of our contributors said that a reception desk had recently been installed so that researchers often don’t really interact directly with the archivist unless they have a particular query, so whilst this may be more efficient, it may distance us more from our users.

Also, it seems that the information that is gathered is not really utilised. It ‘may’ go into reports, and it ‘may’ be used for funding applications, but the suggestion is that this is done in a rather ad hoc manner. At Glasgow, it is important to show that the researchers and students from the University are being prioritised, so the information gathered can help to support this kind of situation.

From the discussion that we had around this topic, our first likely action arose: If there is an easy way for a researcher to grab an archival reference, it will encourage people to include the correct citation, which will help with tracking the use of archives.  This is something that we should be able to introduce for the Hub.

We talked about how easy it would be to simply ask researchers if they will speak about their research. Maybe they could be encouraged to put something about this on the registration form. We felt that if we are honest about what we need (which is often to demonstrate our value in order to secure continued funding), then researchers may be more willing than we might suppose. There is, undoubtedly, an huge feeling of goodwill towards archives, and, as one contributor said, we may be pushing at an open door here.

We talked about the sort of information we would like to gather, and came up with some possibilities:

We would like to know how researchers are coming to the repository – e.g. from the Archives Hub, from a Hub Spoke, from the NRA?
We would like to know if users find what they need from the archival descriptions themselves. Maybe more detailed descriptions sometimes provide the information that they need – they might even show that the archive is not relevant to the research, thus saving the researcher a wasted visit (a positive negative outcome!).
We would like to know more about how people behave when looking at an archive catalogue: Where do they navigate to? Do they explore the catalogue Do they search laterally?

From the discussion with these contributors, it seems that the Archives Hub is having to place more emphasis on issues around ‘market penetration’ than they are at present,  although it was felt that this is starting to change and that archives may well be faced with more pressure to understand their markets and how to effectively reach them.

Finally, we came up with another action, which was to try to compile 3 case studies over the next year. John Rylands agreed to work with us on the first one, so that we can test out how best to approach this. It may be that telling stories is the most fruitful way to get a sense of the impact that archives have. But we cannot ignore the fact that statistics are required, and we do have to continue to look for different ways to demonstrate our value.

Do we need index terms?

image of road signArchival descriptions need to include associated subjects, names and places as index terms. Is that self-evident? Well, certainly we need to do what we can to provide ways into an archive, and you might say the more ways to access it the better. But do archival descriptions need index terms? Do they add anything that keyword searches don’t have?

The Archives Hub encourage our contributors to add access points, which is EAD speak for index terms for subjects, names and places that reflect the content of the description, and therefore the archive. But if those terms are already included in the description, with the technology at our disposal, maybe we can dispense with them as access points and simply query the main body of the description? What are the arguments in favour of keeping index terms?

1. It’s about what is significant. One of the great challenges with archives is drawing out what is important within the archive; enabling researchers to know whether the archive is relevant to them. But this is always going to be a very imperfect exercise. I remember cataloguing an architect’s diaries (Robert Mylne, architect of Blackfriars Bridge) and ending up taking months because I couldn’t bear to leave out any people, or place names or buildings, or building techniques, etc. What if someone really wanted to know about stanchions? If I didn’t mention them, then a search would not bring back the Mylne diaries, and I would have failed to connect researcher to research material. The reality is that with the time and resources at our disposal, what we need to try do is reflect what is ‘most significant’ and include ‘key concepts’, accepting that this is a somewhat subjective judgement and hoping that this is enough to lead the researcher in the right direction. For the Hub we usually recommend adding somewhere between 3 and 10 index terms to a description. It means that the archivist can (arguably) draw out the most pertinent subjects and list the most significant people.

2. It allows for drawing out entities. So, in a sentence like “The collection comprises of material relating to the British National Antarctic Expedition, 1901-1904 (leader Robert Falcon Scott), the British Antarctic Expedition, 1907-1909, led by Shackleton, correspondence with his family, miscellaneous papers and biographical information”, you can separate out the entities. Corporate bodies such as British National Antarctic Expedition, 1901-1904, and personal names such as Robert Falcon Scott.  This is very useful for machine processing of content, as machines do not know that Robert Falcon Scott is a personal name (although we are increaingly developing sophisticated text mining techniques to address this).

It can be particularly useful where the entities are not obvious from the text, such as “[A]s well as material relating to his broadcast and published works, the archive also includes many scripts…”. Notice a lack of definite subject terms such as ‘playwright’, or ‘writer’.  A human user may infer this, but a general search on ‘playwright’ will not bring back any results becauase a machine has to know it too, in order to serve the human user.

3. You can then apply consistency to the entities, in terms of using a pre-defined controlled vocabulary.  Bu in a world where folksonomies are becoming increasingly popular, with increasing use of user tagging, does it make sense to insist on controlled vocabularies?

Take the example above, which is about Arthur Hopcraft. The index terms do include ‘playwrights’ and ‘writers’ so that the user can do a keyword search on these terms, or a specific subject search, and find the description. However, there is an obvious flaw here: the archivist has chosen these terms. Whilst they do both come from the Unesco thesaurus, she could easily have chosen different terms. The index terms do not include ‘scriptwriter’ for example. They do not include ‘television’ or ‘journalism’, both of which could have reasonably been used for this description. We end up with some descriptions that use ‘playwrights’ as a controlled vocabulary term, but others that don’t, and some that maybe use ‘scriptwriters’ when they are essentially about the same subject, or ‘authors’ which is the Unesco preferred term for scriptwriters.

But you cannot cover everything, so you have to make a choice about which subject terms to use. The question is: is it better to have some subject terms rather than none, even if they do not necessarily cover ‘all’ subjects, and so the researcher may carry out a subject search and not find the archive? One important point is that with our without subject terms, you have the same problem; it is just that a specific subject search does actually narrow what the researcher is searching on – the search may not include other fields, such as the scope & content or biographical history. Therefore whilst a subject search helps the researcher to find the most significant collections, it may exclude some collections that might be very pertinent for their research (collections that they may find through a keyword search).

4. Index terms allow for clarification of which entity you are talking about. This can be particularly helpful with identifying people and corporations. The scope and content may refer to Linsday Anderson, but the index entry will provide the dates and maybe an epithet to clarify that this is Lindsay Gordon Anderson, 1923-1994, film director. You could add this information to the scope and content, but it would tend to make it much more dense and arguably more difficult to read if you did this with all names. It would also imply that all names are of equal significance, and it would not be very helpful for machine processing unless you marked it up so that a machine could identify it as a personal name.

5. Index terms allow for connecting the same entity throughout the system. A very useful and powerful reason to have index terms. The main issue here is that contributors do not always enter the same thing, even with rules and sources to draw upon. Personal and corporate names are usually consistent, but inevitably the addition of the epithet, which is much more of an archival practice than a library practice, means that one person often has a number of different entries. If you took the epithet away, at least for the purposes of identifying the same entity, then things would work reasonably well. For subjects it’s more a case of just the amount of subjects that can be used to describe an archive. If you look for all the descriptions with the subject of ‘first world war’, then you won’t find all the descritions that are significantly about this subject because some of them are indexed with ‘world war one’, and other may use ‘war’ and ‘conflict’.

The way around this for the Hub is our ‘Subject Finder’. This is different from a straightforward subject search. It actually looks for similar terms and brings them together. So, a search for ‘first world war’ will bring back ‘world war one’. Similarly, a search for ‘railways’ will bring back the Library of Congress heading of ‘railroads’.

The Subject Finder helps, but does not comletely address this problem of the differing choice of terms. It cannot by-pass the fact that sometimes descriptions do not include any subject terms, so then they will not show up in a subject search. Recently I was looking for archives in the Hub on ‘exploration’, and was surprised to find that many of the Antarctic expeditions collections were not listed in the results. This was because some repositories did not use this subject term; a perfectly legitimate choice not to use it, but many other similar archives do use it.

I still feel that it is worth adding the significant entities as index terms, even with the problems of selecting what is ‘significant’ and with the inconsistencies that we have. Cataloguing as a whole is a subjective exercise, and it will never be perfect. For those who say that index terms are out-dated, I can only say that they are proving pretty useful for our current Linked Data project, and that is certainly pretty up to the minute in terms of Web technologies.

One final point in favour: the Archives Hub index terms exist within the descriptions as clickable links. This allows researchers to carry out ‘lateral’ searches, and it is a popular means to traverse descriptions, exploring from one subject to another, from one person to another.

Whether we should also consider enabling researchers to tag descriptions themselves is a whole other issue for another blog post…

This is not a complete case for and against by any means, but I think I’d better leave it there. I’d love to hear your views.

Is the reading room an echo chamber?

I attended the CILIP Yorkshire and Humberside branch & CDG members day at Leeds Met last week.  It was a great day overall, but one of the highlights – and one of the main reasons I’d wanted to attend – was Laura and Ned’s presentation on Escaping the Echo Chamber.

I’d really recommend watching the presentation – it’s a great example of a well-done Prezi, and although it obviously can’t capture everything from the presentation, it stands alone very well.

The basic premise is this:  librarians talk a lot about the state of libraries and information management and literacy and society and all sorts of other highly interesting and exciting stuff. But they only talk about it to other librarians.  They (we!) only talk about it in library blogs read by other librarians.  And I think it really is only other librarians – I can’t do my usual device here of saying ‘librarians/info profs’, because I’m not sure if librarians even talk to other information professionals about these issues.  Well, I’m here to make a tiny start – I’m going to break out of the librarian echo chamber and extend the conversation to archivists. And record-managers.  And knowledge-managers.  And anyone else who reads this blog!

The problem is: how do we get this information, these discussions to people outside our immediate professional neighbourhood?  This seems to be especially urgent now, with funding under threat – to demonstrate the value of what we do to people outside our professions.  Ideally, to our users and stakeholders – or to create new users and stakeholders by fuelling their understanding of what we do and what we stand for.

I don’t think this problem is unique to the information professions.  All professions suffer from a skewed public perception of their work.  The trouble is, for most professions this perception is formed from the exciting side of their job:  police catch criminals; doctors cure sick people; firefighters rush heroically into burning buildings.  For information professionals, it’s formed from the most boring and routine part of their job: stamping books, putting documents into boxes, making lists.  Why? Police, doctors and firefighters all do paperwork too, they all have the boring and mundane side to their jobs.  Yet no-one (and I really hope that this is still true by the time this post is published, with how the Big Society is shaping up) is suggesting that volunteers can police our streets, remove our appendices, or extinguish our blazes.

Is this because the routine work for most other professions is done in back rooms, behind closed doors?  For information professionals it’s often the exact opposite – we do our most interesting and exciting work away from the public view.  What people often see us doing are those rote jobs that could be (and increasingly are) done by machines.

So how can we address this? How do we get people to understand the value of what we really do?  It’s far from an easy task. Too often we rely on the same sources that have perpetuated the ‘boring’ stereotypes to bring them down – I’m sure that  ‘Who do you think you are?‘ has helped to change the public perception of archives and archivists.  But we can’t rely on the media deciding to use our professions as a prop for their next hit.  So how can we get out there ourselves?

Please do comment!  There’s a lively debate going on about this over on Twitter – check out #echolib to see what’s been said so far.

First class citizens of the Web

Linked Data enthusiasts like to talk about making concepts within data into first-class citizens. This should appeal to archivists. The idea that the concepts within our data are equal sounds very democratic, and is very appealing for rich data such as archival descriptions. But, where does that leave the notion of the all important top-level archival collection description? Archivists do tend to treat the collection description as superior; the series, sub-series, file, item, etc., are important, but subservient to the collection. You may argue that actually they are not less important, but they must be seen in the context of the collection. But I would still propose that (certainly within the UK) the collection-level description generally tends to be the focus and is considered to be the ‘right’ way into the collection, or at least, because of the way we catalogue, it beomes the main way into the collection.

Linked Data uses as its basis the data graph. This is different from the relational model and the tree structure model. In a graph, entities are all linked together in such a way that none has special status. All concepts are linked, the links are specified – that is to say, the relationships are clarified. In a tree structure, everything filters down, so it is inevitable that the top of the tree does seem like the most important part of the data. A data graph can be thought of as a tree structure where links go both ways, and nothing is top or bottom. You could still talk about the collection description being the ‘parent’ of the series description, but the series description is represented equally in RDF. But, maybe more fundamentally than this, Linked Data really moves away from the idea of the record as being at the heart of things and  replaces this with the idea of concepts being paramount. The record simply becomes one other piece of data, one other concept.

This type of modelling accords with the idea that users want to access the data from all sorts of starting points, and that they are usually interested in finding out about something real (a subject, a person) rather than an archive per se. When you model your data into RDF what you are trying to think about is exactly that – how will people want to access this data. In Australia, the record series is the preferred descriptive entry, and a huge amount has been written about the merits of this approach. It seems to me, with RDF, we don’t need to start with the collection or start with the series. We don’t need to start with anything.

Linked Data graph

This diagram, courtesy of Talis, shows part of a data graph for modelling information about spacecraft. You can see how the subjects (which are always represented by URLs) have values that may be literal (in rectangular boxes) or may point to other resources (URLs). Some of this data may come from other datasets (use of the same URL for a spacecraft enables you to link to a different resource and use the values within that resource).

The emphasis here is on the data – the concepts – not on the carrier of the data – the ‘record’.

In our LOCAH project we will need to look at the issue of hierarchy of multi-level descriptions. In truth, I am not yet familiar enough with Linked Data to really understand how this is going to work, and we have not yet really started to tackle this work. I think I’m still struggling to move away from thinking of the record as the basis of things, because, to coin a rather tiresome phrase, RDF modelling is a paradigm shift.  RDF is all about relationships between concepts and I will be interested to see where this leaves relationships between hierarchical parts of an archive description. But I am heartened by Rob Styles’ (of Talis) assertion that RDF allows anyone to say anything about anything.