Locah Linking Lives: an introduction

We are very pleased to announce that the Archives Hub will be working on a new Linked Data project over the next 11 months, following on from our first phase of Locah, and called Linking Lives. We’d like to introduce this with a brief overview of the benefits of the Linked Data approach, and an outline of what the project is looking to achieve. For more in-depth discussion of Linked Data, please see our Locah project blog, or take a look at the Linked Data website guides and tutorials.

Linked Open Data Cloud
Linked Data Cloud

The benefits of Linked Data

The W3C currently has a draft of a report, ‘Library Linked Data‘, which covers archives and museums. In this they state that:

‘Linked data is shareable, extensible, and easily re-usable… These characteristics are inherent in the linked data standards and are supported by the use of web-friendly identifiers for data and concepts.’

Shareable

One of the exciting things about Linked Data is that it is about sharing data (certainly where you have Linked Open Data). I have found that this emphasis on sharing and data integration has actually had a positive effect aside from the practical reality of sharing; it engenders a mindset of collaboration and sharing, something that is of great value not just in the pursuit of the Linked Data vision, but also, more broadly, for any kind of collaborative effort and for encouraging a supportive environment. Our previous Linked Data project, Locah, has been great for forging contacts and for putting archival data within this exciting space where people are talking about the future of the Web and the sorts of things that we might be able to do if we work together.

For the Archives Hub, our aim is to share the descriptions of archive collections as a means to raise the profile of archives, and show just how relevant archives are across numerous disciplines and for numerous purposes. In many ways, sharing the data gives us an opportunity to get away from the idea that archives are only of interest to a narrow group of people (i.e. family historians and academics purely within the History Faculty).

Extensible

The principle of allowing for future growth and development seems to me to be vital. The idea is to ensure that we can take a flexible approach, whereby enhancements can be made over time. This is vital for an exploratory area like Linked Data, where an iterative approach to development is the best way to go, and where we are looking at presenting data in new ways, looking to respond to user needs, and working with what technology offers.

Reusable

‘Reuse’ has become a real buzz word, and is seen as synonymous with efficiency and flexibility.  In this context it is about using data in different contexts, for different purposes. In a Linked Data environment what this can mean is providing the means for people to combine data from different sources to create something new, something that answers a certain need. To many archivists this will be fine, but some may question the implications in terms of whether the provenance of the data is lost and what this might mean. What about if information from an archive description is combined with information from Wikipedia? Does this have any implications for the idea of archive repositories being trusted and does it mean that pieces of the information will be out of their original context and therefore in some way open to misuse or misinterpretation?

Reuse may throw up issues, but it provides a great deal more benefits than risks. Whatever the caveats, it is an inevitable consequence of open data combined with technology, so archives either join in or exclude themselves from this type of free-flow of data. Nevertheless, it is certainly worth thinking about the issues involved in providing data within different contexts, and our project will consider issues around provenance.

Linking Lives

The basic idea of the Linking Lives project is to develop a new Web interface that presents useful resources relating to individual people, and potentially organisations as well.

It is about more than just looking at a name-based approach for archives. It is also about utilising external datasets in order to bring archives together with other data sources. Researchers are often interested in individual people or organisations, and will want to know what sort of resources are out there. They may not just be interested in archives. Indeed, they may not really have thought about using archives, but they may be very interested in biographical information, known connections, events during a person’s lifetime, etc. The idea is to show that various sources exist, including archives, and thus to work towards broadening the user-base of archives.

Our interface will bring different data sources together and we will link to all the archival collections relating to an individual. We have many ideas about what we can do, but with limited time and resources, we will have to prioritise, test out various options and see what works and what doesn’t and what each option requires to implement. We’ll be updating you via the blog, and we are very interested in any thoughts that you have about the work, so please do leave comments, or contact us directly.

In some ways, our approach may be an alternative to using EAC-CPF (Encoded Archvial Content for Corporate Bodies, Persons and Families, an XML standard for marking up names associated with archive collections). But maybe in essence it will compliment EAC-CPF, because eventually we could use EAC authority records and create Linked Data from them. We have followed the SNAC project with interest, and we recently met up with some of the SNAC project members at the Society of American Archivists’ Conference. We hope to take advantage of some of the exciting work that they are doing to match and normalise name records.

The W3C draft report on Library Linked Data states that ‘through rich linkages with complementary data from trusted sources, libraries [and archives] can increase the value of their own data beyond the sum of its sources taken individually’. This is one of the main principles that we would like to explore. By providing an interface that is designed for researchers, we will be able to test out the benefits of the Linked Data approach in a much more realistic way.

Maybe we are at a bit of a crossroads with Linked Data. A large number of data sets have been put out as XML RDF, and some great work has been done by people like the BBC (e.g. the Wildlife Finder), the University of Southampton and the various JISC-funded projects. We have data.gov.uk, making Government data sets much more open. But there is still a need to make a convincing argument that Linked Data really will provide concrete benefits to the end user. Talking about Sparql endpoints, JSON, Turtle, Triples that connect entities and the benefits of persistent URIs won’t convince people who are not really interested in process and principles, but just want to see the benefits for themselves.

Has there been too much emphasis on the idea that if we output Linked Data then other people can (will) build tools? The much quoted adage is ‘The best thing that will be done with your data will be done by someone else’, but is there a risk in relying on this idea? In order to get buy-in to Linked Data, we need visible, concrete examples of benefit. Yes, people can build tools, they can combine data for their own purposes, and that’s great if and when it happens; but for a community like the archives community the problem may be that this won’t happen very rapidly, or on the sort of scale that we need to prove the worth of the investment in Linked Data. Maybe we are going to have to come up with some exemplars that show the sorts of benefits that end users can get. We hope that Linking Lives will be one step along this road; not so much an exemplar as a practical addition to the Archives Hub service that researchers will immediately be able to benefit from.

Of course, there has already been very good work within the library, archive and museum communities, and readers of this blog may be interested in the Use Cases that are provided at http://www.w3.org/2005/Incubator/lld/wiki/UseCaseReport

Just to finish with a quote from the W3C draft report (I’ve taken the liberty of adding ‘archives’ as I think they can readily be substituted):

“Linked Data reaches a diverse community far broader than the library/archive community; moving to library/archival Linked Data requires libraries/archives to understand and interact with the entire information community. Much of this information community has been engendered by the capabilities provided by new technologies. The library/archive community has not fully engaged with these new information communities, yet the success of Linked Data will require libraries/archives to interact with them as fully as they interact with other libraries/archives today. This will be a huge cultural change that must be addressed.”

photo of paper chain dolls
Flickr: Icono SVDs photostream, http://www.flickr.com/photos/28860201@N05/with/3674610629/

Archives Wales

map of wales with archivesI recently attended the ‘Online Development in Wales’ day organised by ARCW (Archives and Record Council Wales) to talk about the Porth Archifau (Archives Hub). I found out a good deal about what is happening in Wales at the moment and heard about plans and wishes for future developments.

In her introduction, Charlotte Hodgson from ARCW talked about the need for online catalogues with images rather than the other way around. Maybe there is too much emphasis on digitisation of images which become separated from their context. She referred to the good work of Archives Network Wales (ANW), but acknowledged that Wales is in danger of falling behind with online catalogues. There is a need to maximise opportunities, minimise duplication and effectively deploy resources.

Kim Collis from ARCW gave some background on ANW (now Archives Wales), which is a searchable database for collection-level descriptions that uses a MySQL database and a Typo3 front-end. It has stayed relatively static since it was first developed; the emphasis of individual offices maybe moved to their own web presence (many were using CALM and there was something of a race to get their catalogues online).  The front-end of the ANW site has not necessarily always been very user-friendly and has not provided the depth of information that it might do. However, it was developed in a standards-based way, and this stands it in good stead for future development. ‘Archives Wales’ was a bolt-on to the database, giving more information and including additional information about repositories, making a more complete and visually appealling site.

There has been some geo-tagging within ANW recently. This was seen as a good way to link in with People’s Collection Wales, enabling users to find out more information about, for example, a family that has owned an estate.  Kim talked about a number of possible developments, such as a project to provide links to  searchable tithe apportionments transcripts. The idea is to allow volunteers to transcribe the images.

Kim talked about the need to improve branding and identity. The site must be kept up to date to give it credibility. But there is, in a sense, competition with repository websites because many repositories want to prioritise these. I think it is worth impressing upon archivists the importance of cross-searching capability that aggregators provide, as well as the value of searching within a repository. We should not presuppose that researchers primarily want to know what is at just one individual office; they usually want to find ‘stuff’ on their topic of interest and then go down to the more detailed level of individual sources of information.

Sam Velumyl from The National Archives talked about the Discovery initiative at TNA, which provides a new information architecture that will accommodate the different systems that TNA has.   The idea is that it can accommodate the integration of other systems easily, making it a more sustainable and flexible solution. They are going to be carrying out an exercise in gathering feedback on Discovery, and you’re likely to hear about that very soon.  Sam said that the feedback will help TNA to decide upon their priorities. It may be that A2A will become active again, but at present this has not been decided.  There were concerns in the room that it is very difficult to get TNA to provide data back out of A2A.

People’s Collection Wales, which was presented to us by three speakers, is very much geared towards user-friendly and fun engagement in the history and culture of Wales. It works on the basis of everything being an item, and it gathers items together in collections by topic, not in the way that archivists would normally understand collections, but simply by areas that will be of interest to users. It is quite an eclectic experience, designed to draw in a broad section of the community and promote learning and understanding of Welsh history.  Re-purposing is a strong principle behind PCW. It integrates social media to encourage the idea of sharing the photograph or interview or whatever on Facebook or Twitter. It also has a scrapbook function so that people can gather together their own collections. It does link to the item within context, so you can link back to the website of the depositor.

PCW are going to be using an API to upload collection records  from Archives Wales. I got a little confused about this, as they also spoke about manual upload. I think the automated upload will only be for certain records.  They are also doing some interesting work with GIS, to enable users to do things like look at maps over time to see how a place has developed, and looking at making museum objects viewable in a 3-D way.

My plea to PCW is to make their titles clickable links where it seems as if they should be clickable. I found the site fun, with some great stuff, but it can take a while to understand what you are looking at. I went to browse the collections and many of them are untitled, and it’s not really clear what they are representing. I tried the map interface and looked for ‘castle’ near ‘barmouth’ and I was taken to a page of images of people talking about the Eisteddfod. The second time it worked better, but some of the images were not actually images and one of them remained in place when I did another search and I couldn’t delete it from the display, and I had a few more experiences of searches hanging and the display freezing. But then other searches worked well and I started getting links from places to objects. So, it was a mixed bag for me, and it seemed quite beta in terms of functionality, and also it was very slow, and I do think that’s a problem.  It feels very experimental, with loads of good ideas, but I wonder if it would be better to concentrate on developing fewer ideas but making them more effective.

The afternoon was more focussed on solutions for getting archives online. CyMAL recently commissioned research to analyse requirements for extending online access to archive catalogues in Wales, building on ARCW, and Sarah Horton gave us a summary of some of the findings.  Some of the stats were quite interesting: 11 local authority services use CALM, 1 uses the Archivists’ Toolkit and 1 uses Word. In higher education: 3 CALM, 1 Word, 1 no formal catalogue. The National Library of Wales uses the virutal library system and AC-NMW uses AdLib.  The survey found that the application of authority files and data standards was variable.

For online Access: 3 via CALMView but there are barriers to this for many offices, one being IT and their concerns about security. 4 services provide access via their own systems, 2 via PDF documents.  About 8,000 collections are listed on Archives Wales and 2,000 on the Hub.

9 services have backlogs of between 10-30%, 6 of over 30% and more if poor quality catalogues are taken into account. Many catalogues remain in manual form only.

We had a very interesting talk on the Black Country History website. Linda Ellis talked about how important it was for the project to be sustainable right from the outset.  The project was about working together to reduce costs and create a sustainable online resource. The original website used the Axiell DSCovery software, but it was not fit for purpose.  The redevelopment was by Orangeleaf System using their CollectionsBase system and WordPress, which means it is very easy to create different front-ends. There are a number of microsites, such as one for geology, filtered by keyword, a great idea for a way to target different audiences with minimal additional effort. Partners can upload data when they like via an XML export from CALM.  CollectionsBase will also take Excel, Access and manual data entry.   There is an API, so the data goes on to Culture Grid and Europeana.

Altogether a very stimulating day, with a good vibe and plenty of discussion.

A Web of Possibilities

“Will you browse around my website”, said the spider to the fly,image of spider from Wellcome images
‘Tis the most attractive website that you ever did spy”

All of us want to provide attractive websites for our users. Of course, we’d like to think its not really the spider/fly kind of relationship! But we want to entice and draw people in and often we will see our own website as our key web presence; a place for people to come to to find out about who we are, what we have and what we do and to look at our wares, so to speak.

The recently released ‘Discovery’ vision is to provide UK researchers with “easy, flexible and ongoing access to content and services through a collaborative, aggregated and integrated resource discovery and delivery framework which is comprehensive, open and sustainable.”  Does this have any implications for the institutional or small-scale website, usually designed to provide access to the archives (or descriptions of archives) held at one particular location?

Over the years that I’ve been working in archives, announcements about new websites for searching the archives of a specific institution, or the outputs of a specific project have been commonplace.  A website is one of the obvious outputs from time-bound projects, where the aim is often to catalogue, digitise or exhibit certain groups of archives held in particular repositories. These websites are often great sources of in-depth information about archives. Institutional websites are particularly useful when a researcher really wants to gain a detailed understanding of what a particular repository holds.

However, such sites can present a view that is based more around the provider of the information rather than the receiver. It could be argued that a researcher is less likely to want to use the archives because they are held at a particular location, apart from for reasons of convenience, and more likely to want archives around their subject area, and it is likely that the archives which are relevant to them will be held in a whole range of archives, museums and libraries (and elsewhere). By only looking at the archives held at a particular location, even if that location is a specialist repository that represents the researcher’s key subject area, the researcher may not think about what they might be missing.

Project-based websites may group together archives in ways that  benefit researchers more obviously, because they are often aggregating around a specific subject area. For example, making available the descriptions and links to digital archives around a research topic. Value may be added through rich metadata, community engagement and functionality aimed at a particular audience. Sometimes the downside here is the sustainability angle: projects necessarily have a limited life-span, and archives do not. They are ever-changing and growing and descriptions need to be updated all the time.

So, what is the answer? Is this too much of a silo-type approach, creating a large number of websites, each dedicated to a small selection of archives?

Broader aggregation seems like one obvious answer. It allows for descriptions of archives (or other resources) to be brought together so that researchers have the benefit of searching across collections, bringing together archives by subject, place, person or event, regardless of where they are held (although there is going to be some kind of limit here, even if it is at the national level).

You might say that the Archives Hub is likely to be in favour of aggregation! But it’s definitely not all pros and no cons. Aggregations may offer a powerful search functionality for intellectually bringing together archives based on a researcher’s interests, but in some ways there is a greater risk around what is omitted. When searching a website that represents one repository, a researcher is more likely to understand that other archives may exist that are relevant to them. Aggregations tend to promote themselves as comprehensive – if not explicitly then implicitly – which this creates expectation that cannot ever fully be met. They can also raise issues around measuring impact and around licensing. There is also the risk of a proliferation of aggregation services, further confusing the resource discovery landscape.

Is the ideal of broad inter-disciplinary cross-searching going to be impeded if we compete to create different aggregations? Yes, maybe it will be to some extent, but I think that it is an inevitability, and it is valid for different gateways to service different audiences’ needs. It is important to acknowledge that researchers in different disciplines and at different levels have their own needs, their own specific requirements, and we cannot fulfill all of these needs by only presenting data in one  way.

One thing I think is critical here is for all archive repositories to think about the benefits of employing recognised and widely-used standards, so that they can effectively interoperate and so that the data remains relevant and sustainable over time. This is the key to ensuring that data is agile, and can meet different needs by being used in different systems and contexts.

I do wonder if maybe there is a point at which aggregations become unwieldy, politically complicated and technically challenging. That point seems to be when they start to search across countries. I am still unsure about whether Europeana can overcome this kind of problem, although I can see why many people are so keen on making it work. But at present, it is extremely patchy, and , for example, getting no results for texts held in Britain relating to Shakespeare is not really a good result. But then, maybe the point is that Europeana is there for those that want to use it, and it is doing ground-breaking work in its focus on European culture; the Archives Hub exists for those interested in UK Archives and a more cross-disciplinary approach; Genesis exists for those interested in womens studies; for those interested in the Co-operative movement, there is the National Co-operative Archive site; for those researching film, the British Film Institute website and archive is of enormous value.

So, is the important principle here that diversity is good because people are diverse and have diverse needs? Probably so. But at the same time, we need to remember that to get this landscape, we need to encourage data sharing and  avoid duplication of effort. Once you have created descriptions of your archive collections you should be able to put them onto your own website, contribute them to a project website, and provide them to an aggregator.

Ideally, we would be looking at one single store of descriptions, because as soon as you contribute to different systems, if they also store the data, you have version control issues. The ability to remotely search different data sources would seem to be the right solution here. However, there are substantial challenges. The Archives Hub has been designed to work in a distributed way, so that institutions can host their own data. The distributed searching does present challenges, but it certainly works pretty well. The problem is that running a server, operating system and software can actually be a challenge for institutions that do not have the requisite IT skills dedicated to the archives department.  Institutions that hold their own data have it in a great variety of formats. So, what we really need is the ability for the Archives Hub to seamlessly search CALM, AdLib, MODES, ICA AtoM, Access, Excel, Word, etc. and bring back meaningful results. Hmmm….

The business case for opening up data seems clear. Project like Open Bibliographic Data have helped progress the thinking in this arena and raised issues and solutions around barriers such as licensing.   But it seems clear that we need to understand more about the benefits of aggregation, and the different approaches to aggregation, and we need to get more buy-in for this kind of approach.  Does aggregation allow users to do things that they could not do otherwise? Does it save them time? Does it promote innovation? Does it skew the landscape? Does it create problems for institutions because of the problems with branding and measuring impact?  Furthermore, how can we actually measure these kinds of potential benefits and issues?

Websites that offer access to archives (or descriptions of archives) based on where they are located and based on they body that administers them have an important role to play. But it seems to me that it is vital that these archives are also represented on a more national, and even international stage. We need to bring our collections to where the users are. We need to ensure that Google and other search engines find our descriptions. We need to put archives at the heart of research, alongside other resources.

I remember once talking about the Archives Hub to an archivist who ran a specialist repository. She said that she didn’t think it was worth contributing to the Hub because they already had their own catalogue. That is, researchers could find what they wanted via the institute’s own catalogue on their own system, available in their reading room. She didn’t seem to be aware that this could only happen if they knew that the archive was there, and that this view rested on the idea that researchers would be happy to repeat that kind of search on a number of other systems. Archives are often about a whole wealth of different subjects – we all know how often there are unexpected and exciting finds. A specialist repository for any one discipline will have archives that reach way beyond that discipline into all sorts of fascinating areas.

It seems undeniable that data is going to become more open and that we should promote flexible access through a number of discovery routes, but this throws up challenges around version control, measuring impact, brand and identity. We always have to be cognisant of funding, and widely disseminated data does not always help us with a funding case because we lose control of the statistics around use and any kind of correlation between visits to our website and bums on seats. Maybe one of the challenges is therefore around persuading top-level managers and funders to look at this whole area with a new perspective?

Arrive in Wonder, Leave in Wisdom!

Roll Up Roll Up for Open Cuture!

image of open culture banner

I arrived at the Open Culture conference just in time to grab a cup of tea and dash along to hear Malcolm Howitt’s talk on Axiell. He focussed on Axiell Arena,
software, a new content management option. It provides for a more interactive experience, complete with tag cloud and the ability to add comments.  It looked pretty good, very much in line with where things are going in terms of these kinds of websites. However, from our point of view as an aggregator what we are keen to see is an API to the data to enable others to engage with it more flexibly, something that has yet to happen on CALM. Maybe this raises the whole issue of the challenge of open data to commercial suppliers – it does rather appear to threaten their business model, and I can see that this would be of concern to them.

The second presentation I saw was from Deep Visuals on ViziQuest, ‘a new way to explore digital collections’. They used natural language processing to extract the concepts from the text.  So the system uses existing metadata in order to enable semantic browsing.  The idea is to provide a different kind of search experience, where the user can meander through a collection of images. You can flip over image to find metadata about the image, which is quite neat.

Deep Visuals have worked with the Scott Poloar Research Institute, one of the Hub contributors, and there are some wonderful images of expeditions. For some images, the archivist has recorded an audio and there are also some film clips  – I saw a great clip on board a ship bound for the arctic.  Currently the software is only available for users within the institute, but it may be made available through the website. You can see a small demo here: http://www.deepvisuals.com/Demo/.  In addition, ViziQuest have taken some expedition diaries and recorded some audio with actors.

The morning was rounded off with a talk about Culture Grid. The importance of Culture Grid being part of national and international initiatives was emphasised, and there was reference to RDTF (now UKDiscovery) and the whole HE agenda, which was good to hear.

Currently Culture Grid contains about 1.65 million item records, mostly referring to images. There are also about 10,000 collection records and 8,000 institution records. We were told that ‘Cuture Grid site and search is not a destination in itself.’  This slightly surprised me, as I did think that this was one of its purposes, albeit only one and maybe not the primary one.

I was impressed by the way Culture Grid is positioning itself as a means to facilitate the use of data by others. Culture Grid has APIs and we were told that a growing range of users do take advantage of this. They are also getting very involved in developer days as a means to encourage innovation. I think this is something archives should engage with, otherwise we will get left behind in the innovative exploration of how to make the most of our data.

Whilst I am very much in agreement with the aims of opening up data, I am not entirely convinced by the Culture Grid website. It does appear to prioritise digital materials – it works much better where there are images. The links back to resources often don’t work. I did a search for ‘victorian theatre’ and first of all the default search was ‘images only’, excluding ‘collections’ and non-images based materials. Then, two of the first four links to resources I clicked on got an internal server error.  I found at least six links that didn’t work on the first two pages of results. Obviously this is not Culture Grid’s fault, but it is certainly a problem. I also wonder about how intuitive it is, with resource links going to so many different types of websites, and at so many different levels of granularity. Quite often you don’t go straight to the resource: one of the links I clicked on from an item went to the Coventry Council homepage, another went to the ‘how do I?’ page of the University of Hull. I asked about the broken links and didn’t feel that the reply was entirely convincing – I think it should be addressed more comprehensively.  I think if the Hub was to contribute descriptions to Culture Grid one of my main concerns would be around updating descriptions. I’m also not sure about the need to create additional metadata. I can’t quite get the reasoning behind the Culture Grid metadata, and the way that the link on the title goes to the ‘resource’ (the website of the contributor), but the ‘view details’ link goes to the Culture Grid metadata, which generally provides a cut down version of the description.

The afternoon was dedicated to Spectrum, something I know only a little about other than that it is widely used as a framework by museums in their collections care. Spectrum is, we were told, used in about 7,000 institutions across Europe. Nick Poole, the CEO of the Collections Trust, emphasised that Spectrum should be a collaborative venture, so everyone needs to engage in it.  Yet maybe it has become so embedded that people don’t think about it enough.  The new Spectrum 4 is seen as providing an opportunity to re-engage the community.

There was an interesting take on Spectrum by the first speaker as a means to actually put people off starting museums…but he was making the important point that a standard can show people what is involved – and that it is a non-trivial task to look after museum collections. I got the impression that Spectrum has been a way to get curators on board with the idea of standards and pulling together to work more professionally and consistently.

Alex Dawson spoke about the latest edition of Spectrum in her capacity as one of the co-editors. Spectrum is a consensus about collections management procedures, about consistency, accountability and a common vocabulary. It is not supposed to be prescriptive; it is the ‘what’ more than the ‘how’.  It has 21 procedures describing collections management activities, of which 8 are considered primary. We were told that the link to accreditation was very important in the history of spectrum, and other milestones have included the introduction of rights management procedures, establishing a clear link between procedures and policy and greater recognition of the importance of the knowledge held within museums (through Spectrum Knowledge).

There has been an acknowledgement that Spectrum started to become more cumbersome and information could get buried within this very large entity, it was also starting to get out of date in certain areas. I can see how Spectrum 4.0 is an improvement on this because it contains clear flow diagrams that bring out the processes much more obviously and shows related procedures. It also separates out the procedural and information requirements.  The advisory content has been stripped out (and put into online Spectrum Advice) in order to concentrate on procedural steps through flow diagrams.

The consultation on Spectrum 4 was opened up via a wiki: http://standards.collectionslink.org.uk/index.php/Collections_Link_Standards_wiki

The main day of the conference included some really great talks. Bill Thompson from the BBC was one highlight.  He talked about ‘A Killer App for Culture’, starting with musings on the meaning of ‘culture’. He talked about digital minds in this generation, which may change the answers that we come up with and may change the meaning of words. Shifting word sense can present us with challenges when we are in the business of data and information. He made the point convincingly that the world is NOT digital, as we often state; it is reassuringly still organic. But digital DATA is everywhere. It is an age in which we experience a digital culture, and maybe the ways that we do this are actually having an effect on the way that we think. Bill cited the book ‘Proust and the Squid’ by Maryanne Wolf which I would also throroughly recommend. Wolf looks at the way that learning to read impacts on the ways that we think.

Matthew Cock from the British Museum and Andrew Caspari from the BBC presented on A History of the World in 100 Objects.  We were told how this initiative gradually increased in scale to become enjoyed by millions of people across the world. It was a very collaborative venture between the BBC and British Museum. There were over 2.5 million visits to the site, often around 40,000 in a week when the programme was not on air.  It was interesting to hear that the mobile presence was seen as secondary at the time, but probably should have been prioritised more. ‘Permanent availability portable and for free’ was absolutely key said Andrew Caspari.

It was an initiative that really brought museums together – maybe not surprising with such a high profile initiative.  The project was about sharing and a different kind of partnership defined by mutual benefit, and most importantly, it was about closing the gap between public engagement and collection research. It obviously really touched people’s imaginations and they felt a sense of being part of something.  It does seem like a very successful combination of good fun, entertainment and learning. However,  we were told that there were issues. Maybe the digital capacity of museums was overestimated and longer lead in times were required than the BBC provided. Also, the upload to the site needed to be simpler.

Cock and Caspari referred to the way the idea spread, with things like ‘A history of the world in 100 sheds’. Should you be worried that this might trivialize the process, or should you be pleased that it caught on, stirred imaginations and controversy and debate?

David Fleming of National Museums Liverpool followed with an equally absorbing talk about museums and human rights. He said museums should be more aware that they are constructs of the society they are in. They should mirror society. They should give up on the idea of being neutral and engage in issues.  He is involved in the International Slavery Museum in Liverpool, and this is a campaigning museum. Should others follow suit? It makes museums an active part of society – both historical and contemporary. Fleming felt that a visit to the museum should stir people and make them want to get involved.

He gave a number of examples of museums where human rights are at the heart of the matter, including:

District Six in South Africa: http://www.districtsix.co.za – very much a campaigning museum that does not talk about collections so much as stories and lives, using emotion to engage people.

The  Tuol Sleng Museum of Genocide Victims in Cambodia, a building that was once Pol Pot’s secret prison. The photographs on this site are hugely affecting and harrowing. Just seemingly ordinary portrait shots of prisoners, but with an extraordinary power to them.

The Lithuanian Museum of Genocide Victims . This is a museum where visitors can get a very realistic experience of what it was like to live under the Soviet regime. Apparently this experience, using actors as Soviet guards, has led to some visitors passing out, but the older generation are passionate to ensure that their children understand what it was like at this time.

We moved on to a panel session on Hacking in Arts & Culture was of particular interest to me.  Linda Ellis from Black Country Museums gave a very positive assessment of how the experience of a hack day had been for them. She referred to the value of nurturing new relationships with developers, and took us through some of the ideas that were created.  You can read a bit more about this and about putting on a hack day on Dan Slee’s blog: https://danslee.wordpress.com/tag/black-country-museums/

What we need now is a Culture Hack day that focuses on archival data – this may be more challenging because the focus is text not images, but it could give us some great new perspectives on our data. According to Rachel Coldicutt, a digital consultant, we need beanbags, beer, pizza, good spirit and maybe a few prizes to hand out….. Doesn’t seem too hard. ….oh, and some developers of course :-)

Some final thoughts around a project at the New Walsall Art Gallery: Neil Lebeter told us that the idea was to make the voice of the artist key. In this case, Bob and Roberta Smith. The project centered around the Jacob Epstein archive and found ways to bring the archive alive through art – you can see some interesting video clips about this process on YouTube: http://www.youtube.com/user/newartgallerywalsall.

I found Open Culture was billed as a conference meeting the needs of museums, libraries and archives, but I do think it was essentially a museums conference with a nod to archives and maybe a slight nod to libraries. This is not to criticise the conference, which was very well presented, and there really were some great speakers, but maybe it points to the challenges of bringing together the three domains?  In the end, they are different domains with different needs and interests as well as areas of mutual interest. Clearly there is overlap, and there absolutely should be collaboration, but maybe there should also be an acknowledgement that we are also different communities, and we have some differing requirements and perspectives.

HubbuB

Diary of the Archives Hub, June 2011

Design Council Archive poster
Desing Council Archive: Festival of Britain poster

This is the first of our monthly diary entries, where we share news, ideas and thoughts about the Archives Hub and the wider world. This diary is aimed primarily at archives that contribute to the Hub, or are thinking about contributing, but we hope that it provides useful information for others about the sorts of developments going on at the Hub and how we are working to promote archives to researchers.

Hub Contributors’ Forum

At the Hub we are always looking to maintain an active and constructive relationship with our contributors. Our Contributors’ Forum provides one way to do this. It is informal, friendly, and just meets once or twice a year to give us a chance to talk directly to archivists. We think that archivists also value the opportunity to meet other contributors and think about issues around data discovery.

We have a Contributors’ Forum on 7th July at the University of Manchester and if any contributors out there would like to come we’d love to see you. It is a chance to think about where the Hub is going and to have input into what you think we should be doing, where our priorities should lie and how to make the service effective for users. Just in case you all jump in at once, we do have a limit on numbers….but please do get in touch if you are interested.

The session will be from 10.30 to 1.00 at the University of Manchester with lunch provided. It will be with some members of the Hub Steering Committee, so a chance for all to mix and mingle and get to know each other. And for you to talk to Steering Committee members directly.

Please email Lisa if you would like to attend: lisa.jeskins@manchester.ac.uk.

Contributor Audio Tutorials

Our audio tutorial is aimed at contributors who need some help with creating descriptions for the Hub. It takes you through the use of our EAD Editor, step-by-step. It is also useful in a general sense for creating archival descriptions, as it follows the principles of ISAD(G). The tutorial can be found at http://archiveshub.ac.uk/tutorials/. It is just a simple audio tutorial, split into convenient short modules, covering basic collection-level descriptions through to multi-level and indexing. Any feedback greatly appreciated – if you want any changes or more units added, just let us know.

Archives Hub Feature: 100 Objects

We are very pleased with our monthly features, founded by Paddy, now ably run by Lisa. They are a chance to show the wealth of archive collections and provide all contributors the opportunity to showcase their holdings.  They do quite well on Google searches as well!

Our monthly feature for June comes from Bradford Special Collections, one of our stalwart contributors, highlighting their current online exhibition: 100 Objects.  Some lovely images, including my favourite, ‘Is this man an anarchist?’ (No!! he’s just trying to look after his family): http://archiveshub.ac.uk/features/100objects/Nationalunionofrailwaymenposter.html

Relevance Ranking

Relevance ranking is a tricky beast, as our developer, John, will attest. How to rank the results of a search in a way that users see as meaningful? Especially with archive descriptions, which range from a short description of a 100 box archive to a 10 page description of a 2 box archive!

John has recently worked on the algorithm used for relevance ranking so that results now look more as most users would expect. For example, if you searched for ‘Sir John Franklin’ before, the ‘Sir John Franklin archive’ would not come up near the top of the results. It now appears 1st in results rather than way down the list, as it was previously. Result.

Images

Since last year we have provided the ability to add images to Hub descriptions. The images have to be stored elsewhere, but we will embed them into descriptions at any level (e.g. you can have an image to represent a whole collection, or an image at each item level description).

We’ve recently got some great images from the Design Council Archive: http://archiveshub.ac.uk/data/gb1837des-dca – take a look at the Festival of Britain entries, which have ‘digital objects’ linked at item level, enabling researchers to get a great idea of what this splendid archive holds.

Any contributors wishing to add images, or simple links to digital content, can easily do so through using the EAD Editor: http://archiveshub.ac.uk/images/ You can also add links to documents and audio files. Let us know if you would like more information on this.

Linking to descriptions

Linking to Hub descriptions from elsewhere has become simpler, thanks to our use of ‘cool URIs’. See http://archiveshub.ac.uk/linkingtodescriptions/. You simply need to use the basic URI for the Hub, with the /data/ directory, e.g. http://archiveshub.ac.uk/data/gb029ms207.

Out and About

It would take up too much space to tell you about all of our wanderings, but recently Jane spent a very productive week in Prague at the European Libraries Automation Group (ELAG), a very friendly bunch of people, a good mix of librarians and developers, and a very useful conference centering on Linked Data.

Bethan is at the CILIP new professionals information day today, busy twittering about networking and sharing knowledge.

Lisa is organising our contributors’ workshops for this year (feels like our summer season of workshops) and has already run one in Manchester. More to follow in Glasgow, London and Cardiff. This is our first workshop in Wales, so please take advantage of this opportunity if you are in Wales or south west England. More information at http://archiveshub.ac.uk/contributortraining/

Joy is very busy with the exciting initiative, UKDiscovery. This is about promoting an open data agenda for archives, museums and libraries – something that we know you are all interested in. Take a look at the new website: http://discovery.ac.uk/.

With best wishes,
The Hub Team

Whose Data Is It?: a Linked Data perspective

A comment on the blog post announcing the release of the Hub Linked Data maybe sums up what many archivists will think: “the main thing that struck me is that the data is very much for someone else (like a developer) rather than for an archivist. It is both ‘our data’ and not our data at the same time.”

Interfaces to the data

Archives Hub search interface

In many ways, Linked Data provides the same advantages as other machine based ways into the data. It gives you the ability to access data in a more unfiltered way. If you think about a standard Web interface search, what it does is to provide controlled ways into the data, and we present the data in a certain way. A user comes to a site, sees a keyword search box and enters a term, such as ‘antarctic exploration’. They have certain expectations of what they will get – some kind of list of results that are relevant to antarctica and famous explorers and expeditions – and yet they may not think much about the process – will all records that have any/either/both of these terms be returned, for example? Will the results be comprehensive? Might there be more effective ways to search for what they want? As those who run systems, we have to decide what a search is going to give the user. Will we look for these terms as adjacent terms and single terms? Will we return results from any field? How will we rank the results? We recently revised the relevance ranking on the Hub because although it was ‘pragmatically’ correct, it did not reflect what users expect to see. If a user enters ‘sir john franklin’ (with or without quotation marks) they would expect the Sir John Franklin Papers to come up first. This was not happening with the previous relevance ranking. The point here is that we (the service providers) decide – we have control over what the search returns and how it is displayed, and we do our best to provide something that will work for users.

Similarly, we decide how to display the results. We provide as a basis collection descriptions, maybe with lower-level entries, but the user cannot display information in different ways. The collection remains the indivisible unit.

With a Web interface we are providing (we hope) a user-friendly way to search for descriptions of archives – one that does not require prior knowledge. We know that users like a straightforward keyword search, as well as options for more advanced searching. We hide all of the mechanics of running the search and don’t really inform the user exactly what their search is doing in any kind of technical sense. When a user searches for a subject in the advanced subject search, they will expect to get all descriptions relating to that subject, but that is not necessarily what they will get. The reason is that the subject search looks for terms within the subject field. The creator of the description must put the subject in as an index term. In addition, the creator of the description may have entered a different term for the subject – say ‘drugs’ instead of ‘medicines’. The Archives Hub has a ‘subject finder’ that returns results for similar terms, so it would find both of these entries. However, maybe the case of the subject finder makes a good point about searching: it provides a really useful way to find results but it is quite hard to convey what it does quickly and obviously. It has never been widely used, even though evidence shows that users often want to search by subject, and by entering the subject as a keyword, they are more likely to get less relevant results.

These are all examples of how we, as service providers, look to find ways to make the data searchable in ways that we think users want and try to convey the search options effectively. But it does give a sense that they are coming into our world, searching ‘our data’, because we control how they can search and what they see.

Linked Data is a different way of formatting data that is based upon a model of the entities in the data and relationships between them. To read more about the basics of Linked Data take a look at some of the earlier posts on the Locah blog (http://blogs.ukoln.ac.uk/locah/2010/08/).

Providing machine interfaces gives a number of benefits. However, I want to refer to two types of ‘user’ here. The ‘intermediate user’ and the ‘end user’. The intermediate user is the one that gets the data and creates the new ways of searching and accessing the data. Typically, this may be a developer working with the archivist. But as tools are developed to faciliate this kind of work, it should become easier to work with the data in this way. The end user is the person who actually wants to use the data.

1) Data is made available to be selected and used in different ways

We want to provide the ability for the data to be queried in different ways and for users to get results that are not necessarily based upon the collection description. For example, the intermediate user could select only data that relates to a particular theme, because they are representing end users who are interested in combining that data with other sources on the same theme. The combined data can be displayed to end users in ways that work for a particular community or particular scenario.

The display within a service like the Hub is for the most part unchanging, providing consistency, and it generally does the job. We, of course, make changes and enhancements to improve the service based on user needs from time to time, but we’re still essentially catering for one generic user as best we can, However, we want to provide the potential to allow users to display data in their own way for their own purposes. Linked Data encourages this. There are other ways to make this possible of course, and we have an SRU interface that is being used by the Genesis portal for Women’s Studies. The important point is that we provide the potential for these kinds of innovations.

2) External links begin the process of interconnecting data

Machine interfaces provide flexible ways into the data, but I think that one of the main selling points of Linked Data is, well, linking data. To do this with the Hub data, we have put some links in to external datasets. I will be blogging about the process of linking to VIAF names (Virtual International Name Authority File), but suffice to say that if we can make the statement within our data that ‘Sir Ernest Shackleton’ on the Hub is the same as ‘Sir Ernest Shackleton’ on VIAF then we can benefit from anything that VIAF links to DBPedia for example (Wikipedia output as Linked Data). A user (or intermediate user) can potentially bring together information on Sir Ernest Shackleton from a wide range of sources. This provides a means to make data interconnected and bring people through to archives via a myriad of starting points.

3) Shared vocabularies provide common semantics

If we identify the title of a collection by using Dublin Core, then it shows that we mean the same thing by ‘title’ as others who use the Dublin Core title element. If we identify ‘English’ by using a commonly recognised URI (identifier) for English, from a common vocabulary (lexvo), then it shows that we mean the same thing as all the other datasets that use this vocabulary. The use of common vocabularies provides impetus towards more interoperability – again, connecting data more effectively. This brings the data out of the archival domain (where we share standards and terminology amongst our own community) and into a more global space.  It provides the potential for intermediate users to understand more about what our data is saying in order to provide services for end users. For example, they can create a cross-search of other data that includes titles, dates, extent, creator, etc. and have reasonable confidence that the cross-search will work because they are identifying the same type of content.

For the Hub there are certain entities where we have had to create our own vocabulary, because those in existence do not define what we need, but then there is the potential for other datasets to use the same terms that we use.

4) URIs are provided for all entities

For Linked Data one of the key rules is that entities are identified with HTTP URIs. This means that names, places, subjects, repositories, etc. within the Hub data are now brought to the fore through having their own identifier – all the individuals, for example, within the index terms, have their own URI. This allows the potential to link from the person identified on the Hub to the same person identified in other datasets.

Who is the user?

So far so good. But I think that whilst in theory Linked Data does bring significant benefits, maybe there is a need to explain the limitations of where we are currently at.Hub Sparql endpoint

Our Linked Data cannot currently be accessed via a human user friendly Web-based search interface; it can however be accessed via a Sparql endpoint. Sparql is the language for querying RDF, the format used for Linked Data. It shares many similarities to SQL, a language typically used for querying conventional relational databases that are the basis of many online services. (Our Sparql endpoint is at http://data.archiveshub.ac.uk/sparql ). What this means is that if you can write Sparql queries then you’re up and running. Most end users can’t, so they will not be able to pull out the data in this way. Even once you’ve got the data, then what? Most people wouldn’t know what to do with RDF output. In the main, therefore, fully utilising the data requires technical ability – it requires intermediate users to work with the data and create tools and services for end users.

For the Hub

we have provided Linked Data views, but it is important not to misunderstand the role of these views – they are not any kind of definite presentation, they are simply a means to show what the data consists of, and the user can then access that data as RDF/XML, JSON or Turtle (i.e. in a number of formats). It’s a human friendly view on the Linked Data if you access a Hub entity web address via a web browser. If however, you are a machine wanting machine readable RDF visiting the very same URI, you would get the RDF view straight off. This is not to say that it wouldn’t be possible to provide all sorts of search interfaces onto the data – but this is not really the point of it for us at the moment – the point is to allow other people to have the potential to do what they want to do.

The realisation of the user benefit has always been the biggest question mark for me over Linked Data – not so much the potential benefits, as the way people perceive the benefits and the confidence that they can be realised. We cannot all go off and create cool visualisations (e.g. http://www.simile-widgets.org/timeline/). However, it is important to put this into perspective. The Hub data at Mimas sits in directories as EAD XML. Most users wouldn’t find that very useful. We provide an interface that enables users with no technical knowledge to access the data, but we control this and it only provides access to our dataset and to a collection-based view. In order to step beyond this and allow users to access the data in different ways, we necessarily need to output it in a way that provides this potential, but there is likely to be a lag before tools and services come along that take advantage of this. In other words, what we are essentially doing is unlocking more potential, but we are not necessarily working with that potential ourselves – we are simply putting it out there for others.

Having said that, I do think that it is really important for us to now look to demonstrate the benefits of Linked Data for our service more clearly by providing some ways into the Linked Data that take advantage of the flexible nature of the data and the external links – something that ‘ordinary’ users can benefit from. We are looking to work on some visualisations that do demonstrate some of the potential. There does seem to be an increasing consensus within cultural heritage that primary resources are too severed from the process of research – we have a universe of unrelated bits that hint at what is possible but do not allow it to be realised. Linked Data is attempting to resolve this, so it’s worth putting some time and effort into exploring what it can do.

We want our data to be available so that anyone can use it as they want. It may be true that the best thing done with the data will be thought of by someone else. (see Paul Walk’s blog post for a view on this).

However, this is problematic when trying to measure impact, and if we want to understand the benefits of Linked Data we could do with a way to measure them. Certainly, we can continue to work to realise benefits by actively working with the Linked Data community and encouraging a more constructive and effective relationship between developers and managers. It seems to me that things like Linked Data require us to encourage developers to innovate and experiment with the data, enabling users to realise its benefits by taking full advantage of the global interconnectivity that is the vision of the Linked Data Web. This is the aim of UKOLN’s Dev CSI project – something I think we should be encouraging within our domain.

So, coming back to the starting point of this blog: The data maybe starts off as ‘our data’ but really we do indeed want it to be everyone’s data. A pick ‘n pix environment to suit every information need.

Flickr: davidlocke's photostream

The long tail of archives

For many of us, the importance of measuring use and impact are coming more to the fore. Funders are often keen for indications of the ‘value’ of archives and typically look for charts and graphs that can provide some kind of summary of users’ interaction with archives. For the Hub, in the most direct sense this is about use of the descriptions of archives, although, of course, we are just as interested in whether researchers go on to consult archives directly.

The pattern of use of archives and the implications of this are complex. The long tail has become a phrase that is banded around quite a bit, and to my mind it is one of those concepts that is quite useful. It was popularised by Chris Anderson, more in relation to the commercial world, relating to selling a smaller number of items in large quantities and a large number of items in relatively small quantities, and you can read more about it in Wikipedia: Long Tail.

If we think about books, we might assume that a smaller number of popular titles are widely used and use gradually declines until you reach a long tail of low use.  We might think that the pattern, very broadly speaking, is a bit like this:

I attended a talk at the UKSG Conference recently, where Terry Bucknell from the University of Liverpool was talking about the purchase of e-books for the University. He had some very whizzy and really quite absorbing statistics that analysed the use of packages of e-books. It seems that it is hard to predict use and that whilst a new package of e-books is the most widely used for that particular year, the older packages are still significantly used, and indeed, some books that are barely used one year may be get significant use in subsequent years. The patterns of use suggested that patron-driven acquisition, or selection of titles after one year of use, were not as good value as e-book packages, although you cannot accurately measure the return on investment after only one year.

Archives are kind of like this only a whole lot more tricky to deal with.

For archives, my feeling is that the graph is more like this:

No prizes for guessing which are the vastly more used collections*. We have highly used collections for popular research activities, archives of high-profile people and archives around significant events, and it is often these that are digitised in order to protect the originals.  But it is true to say that a large proportion of archives are in the ‘long tail’ of use.

I think this can be a problem for us. Use statistics can dominate perceptions of value and influence funding, often very profoundly. Yet I think that this is completely the wrong way to look at it. Direct use does not correlate to value, not within archives.

I think there are a number of factors at work here:

  • The use of archives is intimately bound up with how they are catalogued. If you have a collection of letters, and just describe it thus, maybe with the main author (or archival ‘creator’), and covering dates, then researchers will not know that there are letters by a number of very interesting people, about a whole range of subjects of great interest for all sorts of topics. Often, archivists don’t have the time to create rich metadata (I remember the frustrations of this lack of time). Having worked in the British Architectural Library, I remember that we had great stuff for social history, history of empire, in particular the Raj in India, urban planning, environment, even the history of kitchen design or local food and diet habits. We also had a wonderful collection of photographs, and I recall the Photographs Curator showing me some really early and beautiful photographs of Central Park in New York. Its these kind of surprises that are the stuff of archives, but we don’t often have time to bring these out in the cataoguing process.
  • The use of a particular archive collection may be low, and yet the value gained from the insights may be very substantial. Knowledge gained as a result of research in the archives may feed into one author’s book or article, and from there it may disseminate widely. So, one use of one archive may have high value over time. If you fed this kind of benefit in as indirect use, the pattern would look very different.
  • The ‘value’ of archives may change over time. Going back to my experience at the British Architectural Library, I remember being told how the drawings of Sir Edwin Lutyens were not considered particularly valuable back in the 1950s – he wasn’t very fashionable after his death. Yet now he is recognised as a truly great architect, and his archives and drawings are highly prized.
  • The use of archives may change over time. Just because an archive has not been used for some time – maybe only a couple of researchers have accessed it in a number of years – it doesn’t mean that it won’t become much more heavily used. I think that research, just like many things, is subject to fashions to some extent, and how we choose to look back at our past changes over time. This is one of the challenges for archivists in terms of acquisitions. What is required is a long-term perspective but organisations all too often operate within short-term perspectives.
  • Some archives may never be highly used, maybe due to various difficulties interpreting them. I suppose Latin manuscripts come to mind, but also other manuscripts that are very hard to read and those pesky letters that are cross-written. Also, some things are specialised and require professional or some kind of expert knowledge in order to understand them. This does not make them less valuable. It’s easy to think of examples of great and vital works of our history that are not easy for most people to read or interpret, but that are hugely important.
  • Some archives are very fragile, and therefore use has to be limited. Digitising may be one option, but this is costly, and there are a lot of fragile archives out there.

I’m sure I could think of some more – any thoughts on this are very welcome!

So, I think that it’s important for archivists to demonstrate that whilst there may be a long tail to archives, the value of many of those archives that are not highly used can be very substantial. I realise that this is not an easy task, but we do have one invention in our favour: The Web. Not to mention the standards that we have built up over time to help us to describe our content. The long tail graph does demonstrate to us that the ‘long tail of use’ can be just as much, or more, than the ‘high column of use’. The use of the Web is vital in making this into a reality, because researchers all over the world can discover archives that were previously extremely hard to surface.  That does still leave the problems of not being able to catalogue in depth in order to help surface content…the experiments with crowd-sourcing and user generated content may prove to be one answer. I’d like to see a study of this – have the experiments with asking researchers to help us catalogue our content proved successful if we take a broad overview? I’ve seen some feedback on individual projects, such as OldWeather:

“Old Weather (http://www.oldweather.org) is now more than 50% complete, with more than 400,000 pages transcribed and 80 ships’ logs finished. This is all thanks to the incredible effort that you have all put in. The science and history teams are constantly amazed at the work you’re all doing.” (a recent email sent out to the contributors, or ‘ship captains’).

If anyone has any thoughts or stories about demonstrating value, we’d love to hear your views.

* family history sources

A bit about Resource Discovery

The UK Archives Discovery Network (UKAD) recently advertised our up and coming Forum on the archives-nra listserv. This prompted one response to ask whether ‘resource discovery’ is what we now call cataloguing and getting the catalogues online. The respondent went on to ask why we feel it necessary to change the terminology of what we do, and labelled the term resource discovery as ‘gobledegook’. My first reaction to this was one of surprise, as I see it as a pretty plain talking way of describing the location and retrieval of information , but then I thought that it’s always worth considering how people react and what leads them to take a different perspective.

It made me think that even within a fairly small community, which archivists are, we can exist in very different worlds and have very different experiences and understanding. To me, ‘resource discovery’ is a given; it is not in any way an obscure term or a novel concept. But I now work in a very different environment from when I was an archivist looking after physical collections, and maybe that gives me a particular perspective. Being manager of the Archives Hub, I have found that a significant amount of time has to be dedicated to learning new things and absorbing new terminology. There seem to be learning curves all over the place, some little and some big. Learning curves around understanding how our Hub software (Cheshire) processes descriptions, Encoded Archival Description , deciding whether to move to the EAD schema, understanding namespaces, search engine optimisation, sitemaps, application programming interfaces, character encoding, stylesheets, log reports, ways to measure impact, machine-to-machine interfaces, scripts for automated data processing, linked data and the semantic web, etc. A great deal of this is about the use of technology, and figuring out how much you need to know about technology in order to use it to maximum effect. It is often a challenge, and our current Linked Data project, Locah, is very much a case in point (see the Locah blog). Of course, it is true that terminology can sometimes get in the way of understanding, and indeed, defining and having a common understanding of terms is often itself a challenge.

My expectation is that there will always be new standards, concepts and innovations to wrestle with, try to understand, integrate or exclude, accept or reject, on pretty much a daily basis. When I was the archivist at the RIBA (Royal Institute of British Architects), back in the 1990’s, my world centered much more around solid realities: around storerooms, temperature and humidity, acquisitions, appraisal, cataloguing, searchrooms and the never ending need for more space and more resources. I certainly had to learn new things, but I also had to spend far more time than I do now on routine or familiar tasks; very important, worthwhile tasks, but still largely familiar and centered around the institution that I worked for and the concepts terminology commonly used by archivists. If someone had asked me what resource discovery meant back then, I’m not sure how I would have responded. I think I would have said that it was to do with cataloguing, and I would have recognised the importance of consistency in cataloguing. I might have mentioned our Website, but only in as far as it provided access through to our database. The issues around cross-searching were still very new and ideas around usability and accessibility were yet to develop.

Now, I think about resource discovery a great deal, because I see it as part of my job to think of how to best represent the contributors who put time and effort into creating descriptions for the Hub. To use another increasingly pervasive term, I want to make the data that we have ‘work harder’. For me, catalogues that are available within repositories are just the beginning of the process. That’s fine if you have researchers who know that they are interested in your particular collections. But we need to think much more broadly about our potential global market: all the people out there who don’t know they are interested in archives – some, even, who don’t really know what archives are. To reach them, we have to think beyond individual repositories and we have to see things from the perspective of the researcher. How can we integrate our descriptions into the ‘global information environment’ in a much more effective way. A most basic step here, for example, is to think about search engine optimisation. Exposing archival descriptions through Google, and other search engines, has to be one very effective way to bring in new researchers. But it is not a straightforward exercise – books are written about SEO and experts charge for their services in helping optimise data for the Web. For the Archives Hub, we were lucky enough to be part of an exercise looking at SEO and how to improve it for our site. We are still (pretty much as I write) working on exposing our actual descriptions more effectively.

Linked Data provides another whole world of unfamiliar terminology to get your head round. Entities, triples, URI patterns, data models, concepts and real world things, sparql queries, vocabularies – the learning curve has indeed been steep. Working on outputting our data as RDF (a modelling framework for Linked Data) has made me think again about our approach to cataloguing and cataoguing standards. At the Hub, we’re always on about standards and interoperability, and it’s when you come to something like Linked Data, where there are exciting possibilities for all sorts of data connections, well beyond just the archive community, that you start to wish that archivists catalogued far more consistently. If only we had consistent ‘extent’ data, for example, we could look at developing a lovely map-based visualisation showing where there are archives based on specific subjects all around the country and have a sense of where there are more collections and where there are fewer collections. If only we had consistent entries for people’s names, we could do the same sort of thing here, but even with thesauri, we often have more than one name entry for the same person. I sometimes think that cataloguing is more of an art than a science, partly because it is nigh on impossible to know what the future will bring, and therefore knowing how to catalogue to make the most of as yet unknown technologies is tricky to say the least. But also, even within the environment we now have, archivists do not always fully appreciate the global and digital environment which requires new ways of thinking about description. Which brings me back to the idea of whether resource discovery is another term for cataloguing and getting catalogues online. No, it is not. It is about the user perspective, about how researchers locate resources and how we can improve that experience. It has increasingly become identified with the Web as a way to define the fundamental elements of the Web: objects that are available and can be accessed through the Internet, in fact, any concept that has an identity expressed as a URI. Yes, cataloguing is key to archives discovery, cataloguing to recognised standards is vital, and getting catalogued online in your own particular system is great…but there is so much more to the whole subject of enabling researchers to find, understand and use archives and integrating archives into the global world of resources available via the Web.

Democracy 2.0 in the US

Democracy 2.0: A Case Study in Open Government from across the pond.

I have just listened to a presentation by David Ferriero – 10th Archivist of the US at the National Archives and Records Administration (www.archives.gov). He was talking about democracy, about being open and participatory. He contrasted the very early days of American independence, where there was a high level of secrecy in Government, to the current climate, where those who make decisions are not isolated from the citizens, and citizens’ voices can be heard. He referred to this as ‘Democracy 2.0.’ Barack Obama set out his open government directive right from the off, promoting the principles of more transparecy, participation and collaboration. Ferriero talked about seeking to inform, educate and maybe even entertain citizens.

The backbone of open government must be good record keeping. Records document individual rights and entitlements, record actions of government and who is responsible and accountable. They give us the history of the national experience. Only 2-3 percent of records created in conducting the public’s business are considered to be of permanent value and therefore kept in the US archives (still, obviously, a mind-bogglingly huge amount of stuff).

Ferriero emphasised the need to ensure that Federal records of historical value are in good order. But there are still too many records are at risk of damange or loss. A recent review of record keeping in Federal Agencies showed that 4 out of 5 agencies are at high or moderate risk of improper destruction of records. Cost effective IT solutions are required to address this, and NARA is looking to lead in this area. An electronic records archive (ERA) is being build in partnership with the private sector to hold all the Federal Government’s electronic records, and Ferriero sees this as the priority and the most important challenge for the National Archives. He felt that new kinds of records create new challenges, that is, records created as result of social media, and an ERA needs to be able to take care of these types of records.

Change in processes and change in culture is required to meet the new online landscape. The whole commerce of information has changed permanently and we need to be good stewards of the new dynamic. There needs to be better engagement with employees and with the public. NARA are looking to improve their online capabilities to improve the delivery of records. They are developing their catalogue into a social catalogue that allows users to contribute and using Web 2.0 tools to allow greater communication between staff. They are also going beyond their own website to reach users where they are, using YouTube, Twitter, blogs, etc. They intend to develop comprehensive social media strategy (which will be well worth reading if it does emerge).

The US Government are publishing high value datasets on data.gov and Ferriero said that they are eager to see the response to this, in terms of the innovative use of data. They are searching for ways to step of digitisation – looking at what to prioritise and how to accomplish the most with least cost. They want to provide open government leadership to Federal Agencies, for example, mediating in disputes relating to FoI. There are around 2,000 different security classification guides in the government, which makes record processing very comlex. There is a big backlog of documents waiting to be declassified, some pertaining to World War Two, the Koeran War and the Vietnam War, so they will be of great interest to researchers.

Ferriero also talked about the challenge of making the distiction between business records and personal records. He felt that the personal has to be there, within the archive, to help future researchers recreate the full picture of events.

There is still a problem with Government Agencies all doing their own thing. The Chief Information officers of all agencies have a Council (the CIO Council). The records managers have the Records Management Council. But it is a case of never the twain shall meet at the moment. Even within Agencies the two often have nothing to do with eachother….there are now plans to address this!

This was a presentation that ticked many of the boxes of concern – the importance of addressing electronic records, new media, bringing people together to create efficiencies and engaging the citizens. But then, of course,  it’s easy to do that in words….