User Experiences of Archive Catalogues and Use of Primary Sources

On 19 June we ran a webinar on user research and user behaviour. We had three speakers – David Marshall, a UX Researcher from the University of Cambridge, Kelly Arnstein, a UX Specialist from the University of Glasgow, and Deborah Wilson, a Subject Librarian from Queens University Belfast.

Link to view the Zoom recording of the session – please use the passcode : m^9xj.vt https://jisc.zoom.us/rec/share/T1HJWEHzO5jvLEoJEEjzm2ch9DhlHKiGUQGEQSzrt-jhQ6DzFUEKvyBpWuOTa-Xv.IKKYEwWG9fT5-lup

(main talks 1hr + 25 minute discussion). Slides are also provided as links (below).

The talks were excellent, and followed by a lively discussion. They should prove to be useful to anyone looking at designing a website for archive catalogues, and working with students using primary sources. Overall, there was a lot of consensus about user behaviour, which is useful in terms of sharing findings – because it is likely to be relevant to all archives. The emphasis for this session was on students and academic researchers, but we did discuss some of the challenges of meeting the needs of a diverse audience.

A few summary points that came out of one or more of the talks:

  • People may use an archive catalogue for research and also for teaching, scoping a project, marketing and other reasons.
  • Researchers want comprehensive detailed descriptions
  • They value name of creator
  • They want an idea of the physicality of the collection and the overall size
  • People want context and hierarchy, and like the idea of ‘leafing through’ material to see relationships.
  • There are those who want to get quickly to what they need and those who value browse and serendipity. This seems like a possible tension, and certainly a challenge, in terms of interface design. It may be that at different times the same researcher wants a quick route through and other times they want to take time and discover.
  • Cambridge research found that some users wanted to limit their search by date initially, but there was a strong feeling that a wide search and then filtering was generally a good option.
  • Finding everything of value was seen as key – many researchers were prepared to spend time to discover materials related to their research and worried about missing important materials.
  • The physical object remains key to many researchers
  • Saving searches and other forms of personalisation were seen as a good thing
  • Quite often researchers, especially if they are more experienced, understand that research skills are important and archive catalogues are complex; this may contrast with library databases, where they are more inclined to want to get to things quickly.
  • Undergraduates often don’t understand the different approach needed to engage with primary sources
  • Undergrads often engage with archives at the point of an assignment, where they are being marked on their use of primary sources; they initially try to find sources in the same way as they would search for anything else.
  • It is really valuable to educate students on the importance of context, the broad search and filter approach, understanding citations, evaluating databases, etc. They often don’t really know what primary sources are and can find them off-putting.
  • Researchers can make assumptions about what a repository holds, and then be surprised to find that there is material that is relevant for them.
  • A bad catalogue can put a researcher off, and they may choose to go further afield if the catalogue offers a better experience.
  • People often ignore tooltips. It is a challenge to provide help that people use.

David’s Slides: https://archiveshub.jisc.ac.uk/documents/user-research-dm.pptx

Kelly’s Slides: https://archiveshub.jisc.ac.uk/documents/user-research-ka.pptx

Deborah’s Slides: https://archiveshub.jisc.ac.uk/documents/user-research-dw.pptx

Exploring British Design: New Routes through Content

At the moment, the Archives Hub takes a largely traditional approach to the navigation and display of archive collections. The approach is predicated on hundreds of years of archival theory, expanded upon in numerous books, articles, conferences and standards. It is built upon “respect des fonds” and original order. Archival provenance tells us that it is essential to provide the context of a single item within the whole archive collection; this is required in order to  understand and interpret said item.

ISAD(G) reinforces the ‘top down’ approach. The hierarchy of an archive collection is usually visualised as a tree structure, often using folders. The connections show a top-down or bottom-up approach, linking each parent to its child(ren).

image of hierarchical folders
A folder structure is often used to represent archival hierarchy

This principle of archival hierarchy makes very good sense. The importance of this sort of context is clear: one individual letter, one photograph, one drawing, can only reveal so much on its own. But being able to see that it forms part of a series, and part of a larger collection, gives it a fuller story.

However, I wonder if our strong focus on this type of context has meant that archivists have sometimes forgotten that there are other types of context, other routes through content. With the digital environment that we now have, and the tools at our disposal, we can broaden out our ambitions with regards to how to display and navigate through archives, and how we think of them alongside other sources of information. This is not an ‘either or’ scenario; we can maintain the archival context whilst enabling other ways to explore, via other interfaces and applications. This is the beauty of machine processable data – the data remains unchanged, but there can be numerous interfaces to the data, for different audiences and different purposes.

Providing different routes into archives, showing different contexts, and enabling researchers to create their own narratives, can potentially be achieved through a focus on the ‘real things’ within an archive description; the people, organisations and places, and also the events surrounding them.

image of entities and links
Very simplified model of entities within archive descriptions and links between them

This is a very simplified image, intended to convey the idea of extracting people, organisations and places from the data within archive descriptions (at all levels of description). Ideally, these entities and connections can be brought together within events, which can be built upon the principle of relationships between entities (i.e. a person was at a place at a particular time).

Exploring British Design is a project seeking to probe this kind of approach. By treating these entities as an important part of the ‘networks of things’, and by finding connections between the entities, we give researchers new routes through the content and the potential to tell new stories and make new discoveries. The idea is to explore ways to help us become more fully a part of the Web, to ensure that archives are not resources in isolation, but a part of the story.

A diagram showing archives and other entities connected
An example of connected entities

 

For this project, we are focussing on a small selection of data, around British design, extracting entities from the Archives Hub data, and considering how the content within the descriptions can be opened up to help us put it into new contexts.

We are creating biographical records that can be used to include structured data around relationships, places and events.  We aim to extract people from the archive descriptions in which they are ‘embedded’ so that we can treat them as entities – they can connect not only to archive collections they created or are associated with, but they can also connect to other people, to organisations, to events, to places and subjects. For example, Joseph Emberton designed Simpsons in Piccadilly, London, in 1936. There, we have the person, the building, the location and the time.

With this paradigm, the archive becomes one of the ‘nodes’ of the network,  with the other entities equally to the fore, and the ability to connect them together shows how we can start to make connections between different archive collections. The idea is that a researcher could come into an archive from any type of starting point. The above diagram (created just as an example) includes ‘1970’s TV comedy’ through to the use of portland stone, and it links the Brighton Design Archive, the V&A Theatre and Performance Archive and the University of the Arts London Archive. The long term aim is that our endeavours to open up our data will ensure that it can be connected to other data sources (that have also been made open); sources outside of our own sphere (the Archives Hub data). The traditional interface has its merits; certainly we need to continue to provide archival context and navigation through collections; but we can be more imaginative in how we think about displaying content. We don’t need to just have one interface onto our data. We need to ensure that archives are part of the bigger story, that they can be seen in all sorts of contexts, and they are not relegated to being a bit part, isolated from everything else.

 

ICT and the Student Experience

A HEFCE study from 2010 states that “96% of students use the internet as a source of information” (1). This makes me wonder about the 4% that don’t; it’s not an insignificant number. The same study found that “69% of students use the internet daily as part of their studies”, so 31% don’t use it on a daily basis (which I take to mean ‘very frequently’).

There have been many reports on the subject of technology and its impact on learning, teaching and education. This HEFCE/NUS study is useful because it concentrates on surveying students rather than teachers or information professionals. One of the key findings is that it is important to think about the “effective use of technology” and “not just technology for technology’s sake”. Many students still find conventional methods of teaching superior (a finding that has come up in other studies), and students prefer a choice in how they learn. However, the potential for use of ICT is clear, and the need to engage with it is clear, so it is worrying that students believe that a significant number of staff “lack even the most rudimentary IT skills”. It is hardy surprising that the experiences of students vary considerably when they are partly dependent upon the skills and understanding of their teachers, and whether teachers use technology appropriately and effectively.

At the recent ELAG conference I gave a joint presentation with Lukas Koster, a colleague from the University of Amsterdam, in which we talked about (and acted out via two short sketches) the gap between researchers’ needs and what information professionals provide. Thinking simply about something as seemingly obvious as the understanding and use of

Examples of interface terminology from archives sites
Random selection of interface terminology from archives sites.

the term ‘archives’ is a good case in point. Should we ensure that students understand the different definitions of archives? The distinction between archives that are collections with a common provenance and archives that are artificial collections? The different characters of archives that are datasets, generally used by social scientists? The “abuse” of the term archives for pretty much anything that is stored in any kind of longer-term way? Should users understand archival arrangement and how to drill down into collections? Should they understand ‘fonds’, ‘manuscripts’, ‘levels’, ‘parent collection’? Or is it that we should think more about how to translate these things into everyday language and simple design, and how to work things like archival hierarchy into easy-to-use interfaces?  I think we should take the opportunities that technology provides to find ways to present information in such a way that we facilitate the user experience. But if students are reporting a lack of basic ICT skills amongst teachers, you have to wonder whether this is a problem within the archive and library sector as well. Do information professionals have appropriate ICT skills fit for ensuring that we can tailor our services to meet the needs of the technically savvy user?

Should we be teaching information literacy to students? One of the problems with this idea is that they tend to think they are already pretty literate in terms of use of the internet. In the HEFCE report, a survey of 213 FE students found that 88% felt they were effective online researchers and the majority said they were self-taught. They would not be likely to attend training on how to use the internet. And there is a question over whether they need to be taught how to use it in the ‘right’ way, or whether information professionals should, in fact, work with the reality of how it is being used (even if it is deemed to be ‘wrong’ in some way).  Students are clear that they do want training “around how to effectively research and reference reliable online resources”, and maybe this is what we should be concentrating on (although it might be worth considering what ‘effective use of the internet’ and ‘effective research using the internet’ actually mean). Maybe this distinction highlights the problem with how to measure effective use of the internet, and how to define online or discovery skills.

A British Library survey from 2010 found that “only a small proportion [of students] …are using technology such as virtual-research environments, social bookmarking, data and text mining, wikis, blogs and RSS-feed alerts in their work.”  This is despite the fact that many respondents in the survey said they found such tools valuable. This study also showed that students turn to their peers or supervisors rather than library staff for help.

Part of the problem may be that the vast majority of users use the internet for leisure purposes as well as work or study, so the boundaries can become blurred, and they may feel that they are adept users without distinguishing between different types of use. They feel that they are ‘fine with the technology’, although I wonder if that could be because they spend hours playing World of Warcraft, or use Facebook or Twitter every day, or regularly download music and watch YouTube. Does that mean they will use technology in an effective way as part of their studies? The trouble is that if someone believes that they are adept at searching, they may not go that extra mile to reflect on what they are doing and how effective it really is. Do we need to adjust our ways of thinking to make our resources more user-friendly to people coming from this kind of ‘I know what I’m doing’ mindset, or do we have to disabuse them of this idea and re-train them (or exhort them to read help pages for example…which seems like a fruitless mission)? Certainly students have shown some concern over “surface learning” (skim reading, learning only the minimum, and not getting a broader understanding of issues), so there is some recognition of an issue here, and the tendency to take a superficial approach might be reinforced if we shy away from providing more sophisticated tools and interfaces.

The British Library report on the Information Behaviour of the Researcher of the Future reinforces the idea that there is a gulf between students’ assumptions regarding their ICT skills versus the reality, which reveals a real lack of understanding. It also found a significant lack of training in discovery and use of tools for postgraduate students. Studies like this can help us think about how to design our websites, and provide tools and services to help researchers using archives. We have the challenges of how to make archives more accessible and easy to discover as well as thinking about how to help students use and interpret them effectively: “The college students of the open source, open content era will distinguish themselves from their peers and competitors, not by the information they know, but by how well they convert that knowledge to wisdom, slowly and deeply internalized.” (Sheila Stearns, “Literacy in the University of 2025: Still A Great Thing‟, from The Future of Higher Education , ed. by Gary Olson & John W Presley, (Boulder: Paradigm Publishers, 2009) pp. 98-99).

What are the Solutions?

We should make user testing more integral to the development of our interfaces. It requires resource, but for the Archives Hub we found that even carrying out 10 one-hour interviews with students and academics helped us to understand where we were making assumptions and how we could make small modifications that would improve our site. And our annual online survey continues to provide really useful feedback which we use to adjust our interface design, navigation and terminology. We can understand more about our users, and sometimes our assumptions about them are challenged.

graph showing where people came from who visited the Hub
Archives Hub survey 2013: Why did you come to the Hub today?

User groups for commercial software providers can petition to ensure that out-of-the-box solutions also meet users’ needs and take account of the latest research and understanding of users’ experiences, expectations and preferences in terms of what we provide for them. This may be a harder call, because vendors are not necessarily flexible and agile; they may not be willing to make radical changes unless they see a strong business case (i.e. income may be the strongest factor).

We can build a picture of our users via our statistics. We can look at how users came into the site, the landing pages, where they went from there, which pages are most/least popular, how long they spent on individual pages, etc. This can offer real insights into user behaviour. I think a few training sessions on using Google Analytics on archive sites could come in handy!

We can carry out testing to find out how well sites rank on search engines, and assess the sort of experience users get when they come into a specialist site from a general search engine. What is the text a Google search shows when it finds one of your collections? What do people get to when they click on that link? Is it clear where they are and what they can do when they get to your site?

 * * *

This is the only generation where the teachers and information professionals have grown up in a pre-digital world, and the students (unless they are mature students) are digital natives. Of course, we can’t just sit back and wait a generation for the teachers and information professionals to become more digitally minded! But it is interesting to wonder whether in 25 years time there will be much more consensus in approaches to and uses of ICT, or whether the same issues will be around.

Nigel Shadbolt has described the Web as “one of the most disruptive innovations we have ever witnessed” and at present we really seem to be struggling to find out how best to use it (and not use it), how and when to train people to use it and how and when to integrate it into teaching, learning and research in an effective way.

It seems to me that there are so many narratives and assessments at present – studies and reports that seem to run the gamut of positive to negative. Is technology isolating or socialising? Are social networks making learning more superficial or enabling richer discussion and analysis? Is open access democratising or income-reducing? Is the high cost of technology encouraging elitism in education? Does the fact that information is so easily accessible mean that researchers are less bothered about working to find new sources of information?  With all these types of debates there is probably no clear answer, but let us hope we are moving forward in understanding and in our appreciation of what the Web can do to both enhance and transform learning, teaching and research.

The New Scholarly Record

I was lucky enough to attend the 2012 EmTACL conference in Trondheim, and this blog is based around the excellent keynote presentation by Herbert van de Sompel, which really made me think about temporal issues with the Web and how this can limit our understanding of the scholarly record.

Herbert believes that the current infrastructure for scholarly communication is not up to the job. We now have many non-traditional assets, which do not always have fixity and often have a wide range of dependencies; assets such as datasets, blogs, software, videos, slides which may form part of a scholarly resource. Everything is much more dynamic than it used to be. ‘Research objects’ often include assets that are interdependent with each other, so they need to be available all together for the object to be complete. But this is complicated by the fact that many of them are ‘in motion’ and updated over time.

This idea of dynamic resources that are in flux, constantly being updated, is very relevant for archivists, partly because we need to understand how archives are not static and fixed in time, and partly because we need to be aware of the challenges of archiving ever more complex and interconnected resources. It is useful to understand the research environment and the way technology influences outputs and influences what is possible for future research.

There are examples of innovative services that are responding to the opportunities of dynamic resources. One that Herbert mentioned was PLOS, which publishes open scholarly articles. It puts publications into Wikipedia as well as keeping the ‘static’ copy, so that the articles have a kind of second life where they continue to evolve as well as being kept as they were at the time of submission. For example, ‘Circular Permutation in Proteins‘.

The idea of executable papers is starting to become established – papers that are not just to read but to interact with. These contain access to the primary data with capabilities to re-execute algorithms and even capabilities to allow researchers to upload and use their own data. It produces a complex interdependency and produces a challenge for archiving because if something is not fixed in time, what does that mean for retaining access to it over time?

This all raises the issue of what the scholarly record actually is. Where does it start? Where does it end? We are no longer talking about a bunch of static files but a dynamic interconnected resource. In fact, there is an increasing sense that the article itself is not necessarily the key output, but rather it is the advertising for the actual scholarship.

Herbert concluded from this that it becomes very important to be able to view different points in time in the evolution of scholarly record, and this should be done in a way that works with the Web. The Web is the platform, the infrastructure for the scholarly record.  Scholarly communication then becomes native to the Web. At the heart of this is the need to use HTTP URIs.

However, where are we at the moment? The current archival infrastructure for scholarly outputs deals with things with fixity and boundaries. It cannot deal with things in flux and with inter-dependencies. The Web exists in ‘now’ time; it does not have a built in notion of time. It assumes that you want the current version of something – you cannot use a URI to get to a prior version.

Diagram to show publication on the Web
Slide from Herbert van de Sompel’s presentation showing the publication context on the Web

We don’t really object to this limitation, something evidenced by the fact that we generally accept links that take us to 404 pages, as if it is just an inevitable inconvenience. Maybe many people just don’t think that there is any real interest in or requirement for ‘obsolete’ resources, and what is current is what is important on the Web.

Of course, there is the Internet Archive and other similar initiatives in Web archiving, but they are not integrated into the Web. You have to go somewhere completely different in order to search for older copies of resources.

If the research paper remains the same, but resources that are an integral part of it change over time, then we need to change archiving to reflect this. We need to think about how to reference assets over time and how to recreate older versions. Otherwise, we access the current version, but we are not getting the context that was there at the time of creation; we are getting something different.

Can we recreate a version of a scholarly record? Can we go back to certain point it time so we can see linked assets from a paper as they were at the time of publication? At the moment we are likely to get many 404s when we try to access links associated with a publication. Herbert showed one survey on the decay of URLs in Medline, which is about 10% per year, especially with links to thinks like related databases.

One solution to this is to be able to follow a URI in time – to be able to click on URI and say ‘I want to see this as was 2 years ago’.  Herbert went on to talk about something he has created called Memento. Memento aims to better integrate the current and past Web. It allows you to select a day or time in the browser and effectively take the URI back in time. Currently, the team are looking at enabling people to browse past pages of Wikipedia. Memento has a fairly good success rate with going back to retrieve old versions, although it will not work for all resources. I tried it with the Archives Hub and found it easy to take the website back to how it looked right in the very early days.

Screen shot of the Archives Hub hompeage
Using Memento to take the Archives Hub back in time.

One issue is that the archived copies are not always created near the time of publication. But for those that are, they are created simply as part of the normal activity of the Web, by services like the Internet Archive or the British Library, so there is no extra work involved.

Herbert outlined some of the issues with using DOIs (digital object identifiers), which provide identifiers for resources that use a resolver to ensure that the identifier can remain the same over time. This is useful if, for example, a publisher is bought out – the identifier is still the same as the resolver redirects to the right location However, a DOI resolver exists in the perpetual now. It is not possible to travel back in time using HTTP URIs. This is maybe one illustration of the way some of the processes that we have implemented over the Web do not really fulfil our current needs, as things change and resources become more complex and dynamic.

With Memento, the same HTTP URI can function as the reference to temporally evolving resources. The importance of this type of functionality is becoming more recognised. There is a new experimental URI scheme, DURI , or Dated URI. The ideas is that a URI, such as http://www.ntnu.no, can be dated: 1997-06-17:http://www.ntnu.no (this is an example and is not actionable now). Herbert did raise another possibly of developing Websites that can deal with the TEL (telephone) protocol. The idea would be that the browser asks you whether the Website can use the TEL protocol, and if it can, you get this option offered to you. You can then use this and reference a resource and use Memento to go back in time.

Herbert concluded that the idea of ‘archiving’ should not be just a one-off event, but needs to happen continually. In fact, it could happen whenever there is an interaction. Also, when new materials are taken into a repository, you could scan for links and put them into an archive, so the links don’t die. If you archive the links at the time of publication or when materials submitted to a repository, then you protect against losing the context of the resource.

Herbert introduced us to SiteStory, which offers transactional archiving of a a web server. Usually a web archive sends out a robot, gathers and dumps the data. With SiteStory the web server takes an active part. Every time a user requests a page it is also pushed back into the archive, so you get a fine grained history of the resource. Something like this could be done by publishers/service providers, with the idea that they hold onto the hits, the impact, the audience. It certainly does seem to be a growing area of interest.

Herbert’s slides are available on Slideshare.

A Web of Possibilities

“Will you browse around my website”, said the spider to the fly,image of spider from Wellcome images
‘Tis the most attractive website that you ever did spy”

All of us want to provide attractive websites for our users. Of course, we’d like to think its not really the spider/fly kind of relationship! But we want to entice and draw people in and often we will see our own website as our key web presence; a place for people to come to to find out about who we are, what we have and what we do and to look at our wares, so to speak.

The recently released ‘Discovery’ vision is to provide UK researchers with “easy, flexible and ongoing access to content and services through a collaborative, aggregated and integrated resource discovery and delivery framework which is comprehensive, open and sustainable.”  Does this have any implications for the institutional or small-scale website, usually designed to provide access to the archives (or descriptions of archives) held at one particular location?

Over the years that I’ve been working in archives, announcements about new websites for searching the archives of a specific institution, or the outputs of a specific project have been commonplace.  A website is one of the obvious outputs from time-bound projects, where the aim is often to catalogue, digitise or exhibit certain groups of archives held in particular repositories. These websites are often great sources of in-depth information about archives. Institutional websites are particularly useful when a researcher really wants to gain a detailed understanding of what a particular repository holds.

However, such sites can present a view that is based more around the provider of the information rather than the receiver. It could be argued that a researcher is less likely to want to use the archives because they are held at a particular location, apart from for reasons of convenience, and more likely to want archives around their subject area, and it is likely that the archives which are relevant to them will be held in a whole range of archives, museums and libraries (and elsewhere). By only looking at the archives held at a particular location, even if that location is a specialist repository that represents the researcher’s key subject area, the researcher may not think about what they might be missing.

Project-based websites may group together archives in ways that  benefit researchers more obviously, because they are often aggregating around a specific subject area. For example, making available the descriptions and links to digital archives around a research topic. Value may be added through rich metadata, community engagement and functionality aimed at a particular audience. Sometimes the downside here is the sustainability angle: projects necessarily have a limited life-span, and archives do not. They are ever-changing and growing and descriptions need to be updated all the time.

So, what is the answer? Is this too much of a silo-type approach, creating a large number of websites, each dedicated to a small selection of archives?

Broader aggregation seems like one obvious answer. It allows for descriptions of archives (or other resources) to be brought together so that researchers have the benefit of searching across collections, bringing together archives by subject, place, person or event, regardless of where they are held (although there is going to be some kind of limit here, even if it is at the national level).

You might say that the Archives Hub is likely to be in favour of aggregation! But it’s definitely not all pros and no cons. Aggregations may offer a powerful search functionality for intellectually bringing together archives based on a researcher’s interests, but in some ways there is a greater risk around what is omitted. When searching a website that represents one repository, a researcher is more likely to understand that other archives may exist that are relevant to them. Aggregations tend to promote themselves as comprehensive – if not explicitly then implicitly – which this creates expectation that cannot ever fully be met. They can also raise issues around measuring impact and around licensing. There is also the risk of a proliferation of aggregation services, further confusing the resource discovery landscape.

Is the ideal of broad inter-disciplinary cross-searching going to be impeded if we compete to create different aggregations? Yes, maybe it will be to some extent, but I think that it is an inevitability, and it is valid for different gateways to service different audiences’ needs. It is important to acknowledge that researchers in different disciplines and at different levels have their own needs, their own specific requirements, and we cannot fulfill all of these needs by only presenting data in one  way.

One thing I think is critical here is for all archive repositories to think about the benefits of employing recognised and widely-used standards, so that they can effectively interoperate and so that the data remains relevant and sustainable over time. This is the key to ensuring that data is agile, and can meet different needs by being used in different systems and contexts.

I do wonder if maybe there is a point at which aggregations become unwieldy, politically complicated and technically challenging. That point seems to be when they start to search across countries. I am still unsure about whether Europeana can overcome this kind of problem, although I can see why many people are so keen on making it work. But at present, it is extremely patchy, and , for example, getting no results for texts held in Britain relating to Shakespeare is not really a good result. But then, maybe the point is that Europeana is there for those that want to use it, and it is doing ground-breaking work in its focus on European culture; the Archives Hub exists for those interested in UK Archives and a more cross-disciplinary approach; Genesis exists for those interested in womens studies; for those interested in the Co-operative movement, there is the National Co-operative Archive site; for those researching film, the British Film Institute website and archive is of enormous value.

So, is the important principle here that diversity is good because people are diverse and have diverse needs? Probably so. But at the same time, we need to remember that to get this landscape, we need to encourage data sharing and  avoid duplication of effort. Once you have created descriptions of your archive collections you should be able to put them onto your own website, contribute them to a project website, and provide them to an aggregator.

Ideally, we would be looking at one single store of descriptions, because as soon as you contribute to different systems, if they also store the data, you have version control issues. The ability to remotely search different data sources would seem to be the right solution here. However, there are substantial challenges. The Archives Hub has been designed to work in a distributed way, so that institutions can host their own data. The distributed searching does present challenges, but it certainly works pretty well. The problem is that running a server, operating system and software can actually be a challenge for institutions that do not have the requisite IT skills dedicated to the archives department.  Institutions that hold their own data have it in a great variety of formats. So, what we really need is the ability for the Archives Hub to seamlessly search CALM, AdLib, MODES, ICA AtoM, Access, Excel, Word, etc. and bring back meaningful results. Hmmm….

The business case for opening up data seems clear. Project like Open Bibliographic Data have helped progress the thinking in this arena and raised issues and solutions around barriers such as licensing.   But it seems clear that we need to understand more about the benefits of aggregation, and the different approaches to aggregation, and we need to get more buy-in for this kind of approach.  Does aggregation allow users to do things that they could not do otherwise? Does it save them time? Does it promote innovation? Does it skew the landscape? Does it create problems for institutions because of the problems with branding and measuring impact?  Furthermore, how can we actually measure these kinds of potential benefits and issues?

Websites that offer access to archives (or descriptions of archives) based on where they are located and based on they body that administers them have an important role to play. But it seems to me that it is vital that these archives are also represented on a more national, and even international stage. We need to bring our collections to where the users are. We need to ensure that Google and other search engines find our descriptions. We need to put archives at the heart of research, alongside other resources.

I remember once talking about the Archives Hub to an archivist who ran a specialist repository. She said that she didn’t think it was worth contributing to the Hub because they already had their own catalogue. That is, researchers could find what they wanted via the institute’s own catalogue on their own system, available in their reading room. She didn’t seem to be aware that this could only happen if they knew that the archive was there, and that this view rested on the idea that researchers would be happy to repeat that kind of search on a number of other systems. Archives are often about a whole wealth of different subjects – we all know how often there are unexpected and exciting finds. A specialist repository for any one discipline will have archives that reach way beyond that discipline into all sorts of fascinating areas.

It seems undeniable that data is going to become more open and that we should promote flexible access through a number of discovery routes, but this throws up challenges around version control, measuring impact, brand and identity. We always have to be cognisant of funding, and widely disseminated data does not always help us with a funding case because we lose control of the statistics around use and any kind of correlation between visits to our website and bums on seats. Maybe one of the challenges is therefore around persuading top-level managers and funders to look at this whole area with a new perspective?

A bit about Resource Discovery

The UK Archives Discovery Network (UKAD) recently advertised our up and coming Forum on the archives-nra listserv. This prompted one response to ask whether ‘resource discovery’ is what we now call cataloguing and getting the catalogues online. The respondent went on to ask why we feel it necessary to change the terminology of what we do, and labelled the term resource discovery as ‘gobledegook’. My first reaction to this was one of surprise, as I see it as a pretty plain talking way of describing the location and retrieval of information , but then I thought that it’s always worth considering how people react and what leads them to take a different perspective.

It made me think that even within a fairly small community, which archivists are, we can exist in very different worlds and have very different experiences and understanding. To me, ‘resource discovery’ is a given; it is not in any way an obscure term or a novel concept. But I now work in a very different environment from when I was an archivist looking after physical collections, and maybe that gives me a particular perspective. Being manager of the Archives Hub, I have found that a significant amount of time has to be dedicated to learning new things and absorbing new terminology. There seem to be learning curves all over the place, some little and some big. Learning curves around understanding how our Hub software (Cheshire) processes descriptions, Encoded Archival Description , deciding whether to move to the EAD schema, understanding namespaces, search engine optimisation, sitemaps, application programming interfaces, character encoding, stylesheets, log reports, ways to measure impact, machine-to-machine interfaces, scripts for automated data processing, linked data and the semantic web, etc. A great deal of this is about the use of technology, and figuring out how much you need to know about technology in order to use it to maximum effect. It is often a challenge, and our current Linked Data project, Locah, is very much a case in point (see the Locah blog). Of course, it is true that terminology can sometimes get in the way of understanding, and indeed, defining and having a common understanding of terms is often itself a challenge.

My expectation is that there will always be new standards, concepts and innovations to wrestle with, try to understand, integrate or exclude, accept or reject, on pretty much a daily basis. When I was the archivist at the RIBA (Royal Institute of British Architects), back in the 1990’s, my world centered much more around solid realities: around storerooms, temperature and humidity, acquisitions, appraisal, cataloguing, searchrooms and the never ending need for more space and more resources. I certainly had to learn new things, but I also had to spend far more time than I do now on routine or familiar tasks; very important, worthwhile tasks, but still largely familiar and centered around the institution that I worked for and the concepts terminology commonly used by archivists. If someone had asked me what resource discovery meant back then, I’m not sure how I would have responded. I think I would have said that it was to do with cataloguing, and I would have recognised the importance of consistency in cataloguing. I might have mentioned our Website, but only in as far as it provided access through to our database. The issues around cross-searching were still very new and ideas around usability and accessibility were yet to develop.

Now, I think about resource discovery a great deal, because I see it as part of my job to think of how to best represent the contributors who put time and effort into creating descriptions for the Hub. To use another increasingly pervasive term, I want to make the data that we have ‘work harder’. For me, catalogues that are available within repositories are just the beginning of the process. That’s fine if you have researchers who know that they are interested in your particular collections. But we need to think much more broadly about our potential global market: all the people out there who don’t know they are interested in archives – some, even, who don’t really know what archives are. To reach them, we have to think beyond individual repositories and we have to see things from the perspective of the researcher. How can we integrate our descriptions into the ‘global information environment’ in a much more effective way. A most basic step here, for example, is to think about search engine optimisation. Exposing archival descriptions through Google, and other search engines, has to be one very effective way to bring in new researchers. But it is not a straightforward exercise – books are written about SEO and experts charge for their services in helping optimise data for the Web. For the Archives Hub, we were lucky enough to be part of an exercise looking at SEO and how to improve it for our site. We are still (pretty much as I write) working on exposing our actual descriptions more effectively.

Linked Data provides another whole world of unfamiliar terminology to get your head round. Entities, triples, URI patterns, data models, concepts and real world things, sparql queries, vocabularies – the learning curve has indeed been steep. Working on outputting our data as RDF (a modelling framework for Linked Data) has made me think again about our approach to cataloguing and cataoguing standards. At the Hub, we’re always on about standards and interoperability, and it’s when you come to something like Linked Data, where there are exciting possibilities for all sorts of data connections, well beyond just the archive community, that you start to wish that archivists catalogued far more consistently. If only we had consistent ‘extent’ data, for example, we could look at developing a lovely map-based visualisation showing where there are archives based on specific subjects all around the country and have a sense of where there are more collections and where there are fewer collections. If only we had consistent entries for people’s names, we could do the same sort of thing here, but even with thesauri, we often have more than one name entry for the same person. I sometimes think that cataloguing is more of an art than a science, partly because it is nigh on impossible to know what the future will bring, and therefore knowing how to catalogue to make the most of as yet unknown technologies is tricky to say the least. But also, even within the environment we now have, archivists do not always fully appreciate the global and digital environment which requires new ways of thinking about description. Which brings me back to the idea of whether resource discovery is another term for cataloguing and getting catalogues online. No, it is not. It is about the user perspective, about how researchers locate resources and how we can improve that experience. It has increasingly become identified with the Web as a way to define the fundamental elements of the Web: objects that are available and can be accessed through the Internet, in fact, any concept that has an identity expressed as a URI. Yes, cataloguing is key to archives discovery, cataloguing to recognised standards is vital, and getting catalogued online in your own particular system is great…but there is so much more to the whole subject of enabling researchers to find, understand and use archives and integrating archives into the global world of resources available via the Web.