In With the New: open, flexible, user-centered

The 2013 Eduserv Symposium, was held in the impressive (and very much ‘keep in with the old’) surroundings of One Great George Street in Westminster, the home of the Institute of Civil Engineers.

‘In with the New’ covered new skills sets, new modes of engagement and new ways of working.  With such a wide topic area, the conference took quite a broad-brush approach. Andy Powell of Eduserv introduced the day and talked about dealing with change, change that may be imposed upon us from the outside, as well as being driven internally.

image from Digital Govt ServiceDavid Cotterill from the Government Digital Service gave the opening keynote, which is what I want to focus on here. He said his talk was about ‘my exciting life as a civil servant’….the audience weren’t convinced about this at the outset, but maybe for those interested in open data, there was some shift of opinion by the end!

He talked about the old consensus, which was built around long-term contracts for IT in government; contracts that were consistently awarded to a limited number of suppliers and not to smaller and more innovative suppliers.  IT was not defined as a core function, so out-sourcing was considered appropriate. But in the 21st century things have changed. There is recognition that IT covers very diverse areas. For Government (and for many other organisations), it covers digital public services, mission IT systems (i.e. more niche or specialised systems for government departments), desktop, infrastructure, connectivity, etc. (the more general IT), and, within government, there are also ‘shared services’ (such as for financial systems). David talked about the need to structure mission IT systems and digital public services so that they can run on different desktops or infrastructures and not be tied down (as often used to be the case).

David went on to argue that the Government really has taken up the open agenda, and showed some quotes: “The latest step is the publications of this report on open standards. And once again the government has got it right.” (Wall Street Journal).  He argued that in order to have flexibility to progress, to upgrade, to move forwards, you need open and standards based systems. You also need to look at specific needs in specific areas and not think of IT as some kind of monolithic thing.

It was surprising to hear him say that “this is a great time to be a supplier”, but he said that many of the current deals within government come to an end over the next few years, so there is opportunity for new suppliers and creating a more diverse set-up.

What is 21st century governmentgov.uk screenshot about? David said it’s about things like www.gov.uk/, built using a platform approach (rather than a CMS) which allows the Government Digital Service (GDS) to build products onto it that meet user needs; products that enable the government to engage with citizens. David gave a sense of how this approach is working across UK government, with multi-disciplinary teams including developers, designers, product and service managers, policy, communications, etc.

His core message was to start with the user need. Of course, this is something that we can all agree with, although whether it always happens in reality is debatable, even if it is the intention. We need to shape things in terms of user requirements  right from the start, and not bring it in once all the policy, requirements and  development work is done. We should think about capturing requirements and developing alpha and then beta versions before going live. This may mean that what is initially developed is chucked out after the alpha stage, because it doesn’t meet needs, and then there is a need to start again. I think one of the problems with this approach is that funders do not necessarily facilitate it. How easy would it be to get funding for a project where the iterative process may go on for quite some time, and there is a risk of starting again several times in order to get it right? A further difficulty with this from a funding point of view is that it is much harder to specify what you are going to end up with, because you necessarily need to keep an open mind; you’ll end up (hopefully) with what users want, but it might be different to what was envisaged and you’ll only know after the testing and refining process.

It makes we think about archival software systems, for example.  Surely you should put the user needs at the heart of the development of your system? Ideally you would start out by gathering user requirements for a system, maybe looking at other research done in this area. You’d end up with a specification, listing priorities for your system. Most archives can’t then build it themselves, so they would go out and look at what meets these needs. But would it be possible to test a system out with users, to see if it really does fulfill their needs, and if it doesn’t go back and try something else? The problem here is that if you are buying a system, its hard to apply an iterative approach. However, it may be possible to move to a more user-centered approach. You should have clear evidence that the system does meet key user needs, and, in the absence of an ability to chop and change, you should ensure that the system does not tie you down and that it provides the flexibility to build and modify, so that changing priorities can be met.

It’s good to see Government leading the way. David showed previews of some services that are being developed, working towards a more transparent approach to things like transactional services and he highlighted a government manual about building services that people want to use.  There is now a ‘Standards Hub‘, to promote open standards and also to encourage wider participation in solving data challenges. It is amazing to see Government code onimage of keyboard 'save' key GitHub. Somehow that really brought home to me home how different things are now to 10-15 years ago. David, as well as other speakers at the conference, believes that open standards encourage a more efficient approach, so it becomes a cost-saving venture as well as encouraging public engagement and transparency.

Interoperability, data sharing and standards

I recently spoke at the CILIP MmIT group conference, where I inflicted EAD on a group of unsuspecting librarians. Not just EAD, but MARC and MODS XML and even some Linked Data. They may have said it was a bit like going back to library school, but no-one ran away.

I was talking to them about data sharing and interoperability, and asked them to look at resources described using different schema, to think about appropriateness: how well does the data format allow you to describe the resource? How machine-readable is it? How human-readable is it? How human/machine readable does it need to be? Is the format robust? Transformable? Sustainable? Interoperable?

These are all things you need to consider when you’re deciding which format to put your data in – except, of course, we often don’t think about these things much at all. These decisions might have been effectively made for you by the community. If all of your peer institutions use a certain data format, then you’re more likely to use it too. And if you want to share your data with the community, using the same format as they do is important.

But this means that you’re relying on other people to make these decisions about the best format for your data. Those people might know the sector and the issues involved in general, but they might not know your specific circumstances or users. Their decision might have been made a long time ago, before advances in theory and technology (MARC was first developed in the 1960s, and EAD in the 1990s). The choice of format might have been based on available tools, rather than underlying principles.

The same goes for cataloguing standards. Is sticking strictly to ISAD(G) really the best way to describe your collections to meet the needs of a global audience? (This is a topic that’s up for discussion at the Descriptive Standards Roundtable at the 2013 ARA Conference )

Of course, standards only work as standards if there’s sufficient community take-up, and a consensus on how to apply them.

XKCD on standards http://xkcd.com/927/

But progress isn’t made by blindly following rules, and ‘there’s already a standard for that’ is no reason not to think about whether there could be a better standard for it.

Standards should be developed from needs. What do people need to know? What do they need to be able to do with the data? What do we need to be able to tell them? And, if we’re looking to the future, what might they want to be able to do in the future? What do we need to do to the data now, to allow for future wants?

We can only work with what’s available, and it is important to have shared standards and points of reference. But if you don’t take time to consider these points when you’re choosing a standard, you’re not really choosing at all. You’re just perpetuating the status quo.

So take the time to think about what you’re doing with your data. Know why you’re using a particular standard, even if it’s because it’s the best of a bad bunch, or closest to what you want to do. Think about what it can and can’t do. Talk to others who are using it. Look for chances to comment on proposed revisions. The future of standards is the future of your data, and your data is valuable. Don’t let it decay.

Supporting Historians: responding to changing research practices

image of camera lensThis post picks out some highlights from a report from Ithaka S+R, “Supporting the Changing Research Practices of Historians” by Roger C Schonfeld and Jennifer Rutner (December 2012). It concentrates on findings that are of particular relevance for archivists and for discovery. The report is recommended reading. It is a US study, but clearly there are strong similarities with other countries.

The report finds that underlying research methods are still broadly as they were but practices have changed considerably: “Based on interviews with dozens of historians, librarians, archivists, and other support services providers, this project has found that the underlying research methods of many historians remain fairly recognizable even with the introduction of new tools and technologies, but the day to day research practices of all historians have changed fundamentally.”

It goes on to summarise the improvements that archives might make to meet changing needs, none of which are unexpected: “For archives, we recommend ongoing improvements to access through improved finding aids, digitization, and discovery tool integration, as well as expanded opportunities for archivists to help historians interpret collections, to build connections among users, and to instruct PhD students in the use of archives.”

It is very encouraging to see the positive comments about researchers’ interactions with archivists: “Having a meeting with the archivist and librarian is really fantastic, because they help you understand what is in the archive, and what you might be able to use.” It is clear from the study that archivists have a vital role to play as key collaborators and colleagues of historians, and their value is clear: “Archivists are often able
 to hone and direct an inquiry, bringing to light items and collections that the researcher may have been unaware of.”

The study does highlight the changing nature of interactions with archival material, as a result of the use of digital cameras in particular, which enables the analytical work to take place elsewhere. It is generally felt to be a convenient and time-saving option, enabling long-term interaction with resources outside of the reading room. This development is actually described as “the single most significant shift in research practices among historians.” It raises questions about whether the role of the archivist changes when the analytical work is displaced from the archive, as archivists may have less opportunity for intellectual engagement with researchers.  The study does highlight a possible issue with digital copies, namely the separation of metadata from content, where the researcher has hundreds of images and needs to organise them constructively, and it also found that scholars are struggling to work with digitised non-textual content effectively.

The ability to find time for research trips was a primary challenge for many researchers. “Interviewees repeatedly emphasized that the amount of time they are able to spend in the archives shapes the nature of the interaction with the sources significantly.” Because most struggle to find time for research trips,  digitised sources are hugely beneficial.

The study found that digitised finding aids help researchers to “travel more strategically”. It suggests that high-quality finding aids may become more important as researchers move more towards photographic visits to archives, rather than serendipitous visits. This connection is something I have not thought about before, and I would be very interested to hear what archivists think about this idea.

Of major relevance for a service like the Archives Hub is the conclusion about finding aids:

“The use of online finding aids greatly facilitates, and sometimes displaces, these visits. If a “good” finding aid is readily available online, this might make a scouting visit unnecessary, depending on the importance of the archive to the research project. In some cases, researchers were able to rule out a visit to an archive based on the online finding aids, and re-purpose funds and effort to tracking down other sources for the project.”

This study is a clear endorsement for our belief (which, I should say, is also backed up by our own researcher surveys) that finding aids play a role not only in identifying and prioritising sources, but also in providing enough information in themselves to make a visit unnecessary. As well as this, they may have a kind of positive negative effect: the researcher knows that materials can be ruled out.  The study strongly emphasised the need for “searchable databases” and “centralized searching” and participants talked about the problem with locating each collection independently, especially across the diverse types of archive repository: “The process of identifying archives – in some cases small, local archives or international archives – can present an amazing challenge to researchers.” Clearly comprehensive cross-searching search tools are a huge boon to researchers.

In terms of discovery, Google is clearly a major tool and there was a feeling that it was the most comprehensive discovery tool, as well as being convenient and easy to use. It is often used at the start of a searching process.: “Generally, historians discover finding aids through Google searches and archive websites.” There is a clear demand for more descriptions online: “The general consensus among interviewees was that more online finding aids would greatly benefit their research, and that archives should continue to make efforts to make these accessible online. Continued and expanded efforts to develop finding aids more efficiently and to make them available digitally would seem to support the needs of historians for improved access.”

In terms of PhD students (and maybe others who are inexperienced researchers), the study found issues with the use of archives and other sources:

“Interviews with PhD candidates indicated that there is often little support for them in learning about new research methods or practices, either in their department or elsewhere at their institution, of which they are aware. While the subject matter treated by historians continues to diversify dramatically, new methodologies develop, and research practices change rapidly, it is clearly critically important that students have a grounding in the methods and practices of the field.” The Archives Hub has recently produced a brief Guide to Using Archives for the Inexperienced, and discussions on the archives email list showed just how much this is an important topic for archivists and how there was a general consensus that  PhD students need more training on research methodologies.

Summing up, the report makes six recommendations specifically for Archives:

1. More online finding aids
2. More digitisation
3. Discovery tools that promote cross-searching, crossing institutional boundaries and encompassing small and local record offices
4. Adequate resources for ensuring the expertise of the archivist continues to be available, enabling archivists to be active interpreters of the collections
5. Adapting to and facilitating the use of digital cameras and scanners in reading rooms
6. Training PhD students in the use of archives

There is a great deal more of interest and relevance in the report around searching, Google Scholar, the use of the academic library, organising and managing research, citation management and digital research methods. It is very well worth reading.

 

The Shape of Knowledge

row of booksIn the 1870s a young man from a small town in New York decided to organise the world’s knowledge. Well, at least the world’s knowledge in book form. The now ubiquitous  Dewey Decimal system divides knowledge decimally, as Dewey loved the decimal system. So, there are ten top-level classes with ten first-level sub-divisions (and so on). It’s a curious arrangement. Eight of the nine major divisions for religion are given over to Christianity. Dewey relegates Buddhism right down the ranks of its hierarchy, as a ‘religion of Indian origin’. It gives an entire category over to ‘Paranormal Phenomena’, and 999 is, rather satisfyingly, ‘extraterrestrial worlds’ (under 990, ‘General history of other areas’). When computing came along, there was no room for it left in the 600’s – Technology and Applied Sciences – so it went under the 000’s, which was originally for ‘generalities’.

“And there’s the weakness and the greatness of Dewey’s system. The…system lets patrons stroll through the collected works of What We Know – our collective memory palace – but the price for ordering knowledge in the physical world is having to make either-or decisions…The library’s geography of knowledge can have one shape but no other.” (Everything is Miscellaneous, David Weinberger)

The world of Dewey classification doesn’t reflect the way we see the world now because the shape of knowledge is fluid and ever-changing, and even then there were many who disputed his arangement. But it seems that for now we’re stuck with the basics of the Dewey system because the implications of changing it would be massive – libraries the world over have been physically ordered based on Dewey, and long decimal numbers have been painstakingly written on the spines of millions of books.

The Dewey system came to be as a result of the need to store one book in one place – knowledge has to be ordered when it is on shelves. Archives avoid this particular trap because they are not set out on shelves for people to browse, so they do not need a set physical order. The danger of archives being stereotyped as dusty boxes on shelves in dark rooms at least provided the advantage that they did not need to be ordered for browsing; the intellectual arrangement of archives has always been via the finding aids, so the physical collections did not need to undergo the either-or of arrangement in the way that libraries did.

Dewey relies upon giving a book a subject (although there can be cross-referencing to it of course). A book is not always easy to categorise under a subject; but an archive collection may be nigh on impossible to shoe-horn into one subject heading. If it’s hard enough to decide where to put a book about something like globalisation, trade and technology, for example, then it is almost an impossible task with archives because one collection is typically about a whole range of subjects, often ostensibly unrelated. And, of course, often archives are not consciously ‘about’ a subject, in so far as the subject is not central to the reason they were created. For example, a series of correspondence held in a Manchester archive might not be created to consciously describe or explain aspects of social housing developments in Manchester, but it might provide valuable evidence nonetheless; a letter might be written by someone moving into a new housing development, giving a great insight into how people felt about the large post-war housing estates, and what sort of changes it made to their lives. But the collection wouldn’t be ‘put under housing’ because it doesn’t need to be. It would really be impossible to physically put it together with other materials about the same subject because the correspondence might cover all manner of subjects – in a sense random subjects – if the writer is essentially communicating news and stuff that affects their life.

So, what are the  implications for archives cataloguing? How does ‘the geography of knowledge’ impact on archives? We haven’t got something like Dewey, we don’t have the problem of arranging physical things on shelves for people to browse. But do we still have a sense of ‘the right way’ to organise knowledge?

Well, we may not physically arrange archive collections on shelves, but we do approach dealing with each collection by the principles that we deem to be important – provenance and original order. Maybe we’re lucky that we have the principle of original order because it gives us a sensible, rational means to order a collection of sometimes very disparate materials (or you might say the idea is that the collection is already ordered for us). If we dispensed with original order, then we could come up with all sorts of other ways to order things but it is hard to see them making much sense. Weinberger’s book ‘Everything is Miscellaneous’, holds to the principle that in the digital age information wants to be free from all physical constraints, but I contend that original order provides a physical order that gives researchers an option – a way into the content should they choose to take it. I think ‘everything can be miscellaneous’ is more to the point. There are good reasons for imposing a physical order on an archive; but that shouldn’t mean that researchers are constrained as a result.

I think that what we need to be thinking about is enabling researchers to organise knowledge themselves – in a way that is relevant and useful for their own purposes. This potential for organisation is directly related to how we catalogue. Many people will search by subject, but when I look at the descriptions on the Archives Hub, I find many don’t have subject headings added to them. Subject headings offer significant advantages; they allow for the idea of different ways into a collection of information. They are like different pathways for researchers to take in order to get to the collection and connect it up with other collections.

When I search for ‘cooperative movements’ as a phrase on the Hub I get 40 hits. When I search for it as a subject I get 15 hits. If the system was working perfectly, I would deduce from this that there are 15 instances where ‘cooperative movement’ is a significant subject, and 25 more where it is relevant in some way – maybe it is referred to in passing, but the archive is not substantially concerned with this topic. However, it doesn’t really work like this because it is impossible to achieve that level of consistency in cataloguing. Different people catalogue differently. Some cataloguers put in more subjects, and some less; some maybe take more time to think about appropriate subjects, others just add a few very quickly; some don’t put any in at all, maybe believing that a free text search is enough. The end result of this is that searching becomes even more of a chance thing than it maybe needs to be. The irony for me, managing an aggregator, is that life would probably be a great deal easier if everyone catalogued in a superficial way…as long as it was consistent. As it is, you enter a subject term and you may still miss an archive of major importance. Enter a keyword (searching all the text) and you may not enter the same word(s) the cataologuer has used. There is, without doubt, an inevitable mis-match between what the cataloguer does and what the researcher needs in many cases.

It is a similar situation with the title of the material, which has become a vital way into collections now that so many people use general search engines. The title is what they see in a list of Google results. It needs to do its very best to reflect the content of the archive.  “Miscellany of eighteenth century poems by various authors” is pretty good, when you have something that is quite varied it pulls it together by what it is and when it was created. “Verse miscellany” is not so good, as it gives the researcher less to go on. “Poems” is pretty vague. A researcher on the Hub can look for ‘poems’ and then narrow the search down by other means, but when on Google these titles are not so useful. We try to keep the dates of creation with the title, as the two together provide a good deal more information. But a title can so often give a sense of the miscellaneous in archives; and it can be quite difficult to get round this with some of the more varied collections, which can sometimes be somewhat esoteric. Other titles just offer a personal or organisation name, which is fine when the researcher is in the reading room – they assume the name means that this is an archive about this person/organisation. Out of content a name is just a name and could mean absolutely anything.

Of course, we have to take a pragmatic approach, and there has been plenty written about this. Cataloguing will never ever be perfect: researchers will always have to seek in order to find. But we can probably do more to make things better, and we can try to understand more about the ways that people both look for something they want to find and search for what is out there (not knowing what they want to find).

I believe that it is worth putting a small amount extra thought into the words that are chosen when cataloguing, thinking about how each end-user will want to organise their own geography of knowledge.  A bit of thought about the key significant subjects is a good approach. This will help people, coming from different perspectives, and different search strategies, to discover archive collections.

We are still a long way from connecting things up in a way that researchers would like to see. The vision of Linked Data is to do just this. It offers a way to make connections across data sets. It opens up the idea of organising knowledge so that its never just one thing but a completely fluid landscape.  It’s not Melvil Dewey, looking at the world and giving us his version of how it should be organised; rather it is offering the chance to organise the world in an infinite number of ways. If others out there have resources on ‘The Fabian Society’ or ‘Beatrice Webb’ or ‘ the co-operative movement’ they can state that their concepts are the same as mine, and therefore my archive can be linked to these other resources.  This opens up data, enabling people to traverse data sets and bring resources together for their own ends.  For creating Linked Data, structured concepts, like subject headings, are a great  help, because they facilitate making these connections. Of course, there’s a bit more involved in Linked Data (including creating persistent URIs and actually matching up the same concepts), but the potential to link knowledge together in this large-scale way is immense.

Another means to encourage this fluidity is to allow end-users to add tags to content, so that we generate a mass of ways into the data. We really have to seriously consider this option for archival data, because it offers such significant advantages in terms of making things more discoverable. It is moving away from the idea that there is one way of doing things. It allows for things to be organised in an infinite variety of ways. Plenty of projects are now doing this, such as the zooniverse science projects https://www.zooniverse.org/, the Your Paintings project and the British Library georeferencing project for maps, but I’m not sure that we are really embracing it on a day-to-day level within archive catalogues.

lego brick

An archive can act like a lego set. As archivists we present the set as it was originally built, and we aim to keep this because it is evidence of its use. But we want, somehow, to label the whole, and to label parts of the whole, in such a way that researchers can take bits of them and use them to build other constructs; the difference now from 50 years ago is that we are more aware that we should not try to second-guess the constructs that people want to make, but we should catalogue to allow for infinite patterns.

 

 

 

With a little help from the Interface

It is tempting to forge ahead with ambitious plans for Web interfaces that grab the attention, that look impressive and do new and whizzy things. But I largely agree with Lloyd Rutledge that we want “less emphasis on grand new interfaces” (Lloyd Rutledge, The Semantic Web – ISWC 2010, Selected Papers). I think it is important to experiment with exciting, innovative interfaces, but the priority needs to be creating interfaces that are effective for users, and that usually means a level of familiarity and supporting the idea that “users of the Web feel it acts they way they always knew it should (even though they actually couldn’t imagine it beforehand).” Maybe the key is to make new things feel familiar, so that we aren’t asking users to learn a whole new literacy, but a new literacy will gradually emerge and evolve.

For the Archives Hub, we face similar challenges to many websites that promote and provide access to archives, although our challenges are compounded by being an aggregator and not being in control of the content of the descriptions. We are seeking to gradually modify and improve our interfaces, in the hope that we help to make the users’ discovery experiences more effective, and encourage people to engage with archives.

One of our aims is to introduce options for users that allow them to navigate around in a fairly flexible manner, meeting different levels of experience and need, but without cluttering the screen or making the navigation look complicated and off-putting. Interviews with researchers have indicated how people have a tendency to ‘click and see’, learning as they go, but expecting useful results fairly quickly, so we want to work with this principle, to use hyperlinks effectively, on the understanding that the terminology used and the general layout of the page will have an effect on user expectations.

A Separation of Parts

One of the issues when presenting an archival description is how to separate out the ‘further actions’ or ‘find out more’ from the basic content. The challenge here is compounded by the fact that researchers often believe the description is the actual content, and not just metadata, or alternatively they assume that they can always access a digital resource.

We have tried to simplify the display by introducing a Utility Bar. It is intended to bring together the further options available to the end user. The idea is to make the presentation neater, show the additional options more clearly, and also keep the main description clear and self-contained.

Archives Hub description

 

The user can click to find out how to access the materials, to find out where the repository is located in the UK or contact the repository by email. We are planning to make the email contact link more direct, opening an email and populating it with the email address of the repository in order to cut down on the number of stages the user has to go through (currently we link to the Archon directory of Archive services). We can also modify other aspects of the Utility Bar over time, adding functionality as required, so it is a way to make the display more extensible.

We have included links to social networking sites, although in truth we have no real evidence that these are required or used. This really was a case of ‘suck it and see’ and it will be interesting to investigate whether this functionality really is of value. We certainly have a lively following on Twitter, and indications are that our Twitter presence is valued, so we do believe that social networking sites play an important part in what we do.

We have also included the ability to view different formats. This will not be of value to most researchers, but it is  intended to be part of our mission to open up the data and give a sense of transparency – anyone can see the encoding behind the description and see that it is freely available. Some of our contributors may find it useful, as well as developers interested in the XML behind the scenes.

The Biggest Challenge: how to present an archive description

Until recently we presented users with an initial hit list of results, which enabled them to see the title of a description and choose between a ‘summary’ presentation and a ‘full’ presentation. However, feedback indicates that users don’t know what we mean by this. Firstly, they haven’t yet seen the description, so there is nothing on which to base the choice of  link to click, and secondly, what is the definition of ‘summary’ and ‘full’ anyway? Our intention was to give the user the choice of a fairly brief, one page summary description, with the key descriptive data about the archive collection, or the full, complete description, which may run to many pages. A further consideration was that we could only provide highlighting of terms on a single page, so if we only had the full description, highlighting would not be possible.

There are a number of issues here. (a) Descriptions may be exactly the same for summary and full because sometimes they are short, only including key fields, and they do not provide multi-level content; the full description will only provide more information if the cataloguer has filled in additional fields, or created a multi-level display. (b) ‘Summary’ usually means a cut-down version of something, taking key elements, but we do not do this; we simply select what we believe to be the key fields. For example, Scope and Content may actually be very long and detailed, but it would always be part of the ‘summary’ description. (c) Fields that are excluded from the summary view may be particularly important in some cases – for example, the collection may be closed for a period of time, and this would really be key information for a researcher.

With the new Utility Bar we changed ‘summary’ and ‘full’ to become ‘brief’ and ‘detailed’. We felt that this more accurately reflects what these options represent. At present we have continued with the same principle of displaying selected fields in the ‘brief’ description, but we feel that this approach should be revised. After much discussion, we have (almost) decided that we will change our approach here. The brief description will become simply the collection-level description in its entirety; the detailed description will be the multi-level description. This gives the advantage of a certain level of consistency, but there are still potential pitfalls. Two of the key issues are (a) that ‘brief’ may actually be quite long (a collection description can still be very long) and (b) that many descriptions are not multi-level, so there would be no difference between the two descriptions. Therefore, we will look at creating a scenario where the user only gets the ‘Detailed Description’ link when the description is multi-level. If we can do this we will may change the terminology; but in the end there is no real user-friendly way to succinctly describe a collection-level as opposed to a multi-level description, simply because many people are not aware of what archival hierarchy really means.

Archives Hub list of resultsAs well as introducing the Utility Bar we changed the hit list of results to link the title of the description to the brief view. We simply show the title and the date(s) of the archive, as we feel that these are the key pieces of information that the researcher needs  in order to select relevant collections to view.

 

Centralised Innovation

For some of the more complex changes we want to make, we need to first of all centralise the Archives Hub, so that the descriptions are all held by us. For some time we thought that this seemed like a retrograde step: to move from a federated system to a centralised system. But a federated system adds a whole layer of complexity because not only do you not have control over the data you are presenting; you do not have control over some of the data at all, to view it, and examine any issues with it, and also to potentially improve the consistency (of the markup in particular). In addition, there is a dependency between the centralised system and the local systems that form the federated model. Centralising the data will actually allow us to make it more openly available as well, and to continue to innovate more easily.

Multiple Gateways: Multiple Interfaces

We will continue to work to improve the Archives Hub interface and navigation, but we are well aware that increasingly people use alternative interfaces, or search techniques. As Lorcan Dempsey states: “options have multiplied and the breadth of interest of the local gateway is diminished: it provides access only to a part of what I am potentially interested in.” We need to be thinking more broadly: “The challenge is not now only to improve local systems, it is to make library resources discoverable in other venues and systems, in the places where their users are having their discovery experiences.” (Lorcan Dempsey’s Webblog). This is partly why we believe that we need to concentrate on presenting the descriptions themselves more effectively – users increasingly come directly to descriptions from search engines like Google, rather than coming to the Archives Hub homepage and entering a search from there. We need to think about any page within our site as a landing page, and how best to help users from there, to discovery more about what we have to offer them.

 

 

 

 

 

 

 

 

Season’s greeting and Christmas closure

"Sunshine Annual 1938. The brightest of the year."
“Sunshine Annual 1938. The brightest of the year.”
The Sunshine Annual was a children’s annual produced by the Co-op movement.
Image copyright © National Co-operative Archive.

The Archives Hub team wish everyone a very Merry Christmas, and a Happy New Year!

The Archives Hub office will close on 21st December and will reopen on the 2nd January.

The Archives Hub service will be available over Christmas and New Year, but there will be no helpdesk support. Any queries sent over this period will be dealt with when we return.

The Hub out and about – presenting, training, and pubbing

The Hub team like to get out and about to present, teach, and chat about archives and information. It can get a bit lonely being a purely online service, with our users and contributors at the other end of an email or phone call, so we try to ensure that we take advantage of chances to meet them face-to-face.

The last week of November was a busy week for this! On the Wednesday Jane and I (Bethan) gave a presentation to the MA Library & Information students at MMU.

We’ve given similar presentations to Archive students and early-career professionals in the past, but this is the first time we’ve given one to Library students. I’m pleased to say it worked  well – the students were engaged and knowledgeable about archives, and how issues in libraries and archives cross-over.

It’s always very encouraging and stimulating to meet an enthusiastic group (I’d also met them the week before to talk about professional organisations), and both Jane and I really enjoyed giving the session. We had some nice feedback from the students, too, with one person saying:

The workshop was informative as well as entertaining. Complex issues were broken down so they were easier to understand. In a short amount of time a lot of areas were covered and due to the lively presentation style we all remained engaged and interested throughout.

And another said that they wished they had more next week!

I think it’s very important for us to be involved in talking to students, trainees, and early-career professionals. It’s good for them to hear from people who are actually working with the data that they’ll be creating. If nothing else, if we educate them about the need for good, interoperable data now, we’ll get better data from them later on! It’s also great to be able to tell them about the different sorts of jobs and opportunities there are for them, and hopefully give them some ideas about ‘alternative’ careers.

The next day saw me, Jane and Lisa heading down to London, for the inaugural ‘Hub in the Pub‘ on the Thursday evening, before a training session on the Friday. We joined forces with a large contingent of museum folk who were ‘Drinking about Museums’, and had a very enjoyable and useful couple of hours chatting about general information, data, and cultural heritage issues. We hope to have more ‘Hub in the Pub’ events in future, so watch our mailing list and twitter feed for details.

We made sure that the evening didn’t get too merry, so we were on top form for our contributors training day the next day. These training days are designed to help current and potential contributors use our EAD Editor, and are also a great chance to get to know our contributors  and chat to them about any issues they might have. We have a few places left on our next training day in Glagsow in January – do sign up if you’d like to come along, or contact us if you’d like to know more.

If you can’t get along to a training session, we have online audio tutorials and a workbook designed to give you a step-by-step guide to using the Editor – and we’re always happy to answer any questions.

An evaluation of the use of archives and the Archives Hub

This blog is based upon a report written by colleagues at Mimas* presenting the results of the evaluation of our innovative Linked Data interface, ‘Linking Lives‘. The evaluation consisted of a survey and a focus group, with 10 participants including PhD students and MA students studying history, politics and social sciences. We asked participants a number of questions about the Archives Hub service, in order to provide context for their thoughts on the Linking Lives interface.

This blog post concentrates on their responses relating to the use of archives, methods of searching and interpretation of results. You can read more about their responses to the Linking Lives interface on our Linking Lives blog.

Use of Archives and Primary Source Materials

We felt that it was important to establish how important archives are to the participants in our survey and focus group. We found that “without exception, all of the respondents expressed a need for primary resources” (Evaluation report). One respondent said:

“I would not consider myself to be doing proper history if I wasn’t either reinterpreting primary sources others had written about, or looking at primary sources nobody has written about. It is generally expected for history to be based on primary sources, I think.” (Survey response)

One of the most important factors to the respondents was originality in research. Other responses included acknowledgement of how archives give structure to research, bringing out different angles and perspectives and also highlighting areas that have been neglected. Archives give substance to research and they enable researchers to distinguish their own work:

“Primary sources are very valuable for my research because they allow me to put together my own interpretation, rather than relying on published findings elsewhere.” (Survey response)

Understanding of Archives

It is often the case that people have different perceptions of what archives are, and with the Linking Lives evaluation work this was confirmed. Commonly there is a difference between social scientists and historians; the former concentrating on datasets (e.g. data from the Office of National Statistics) and the latter on materials created during a person’s life or the activities of an organisation and deemed worthy of permanently preserving. The evaluation report states:

“The participants that had a similar understanding of what an archive was to the Archive Hub’s definition had a more positive experience than those who didn’t share that definition.”

This is a valuable observation for the work of the Hub in a general sense, as well as the Linking Lives interface, because it demonstrates how initial perceptions and expectations can influence attitudes towards the service. In addition, the evaluation work highlighted another common fallacy: that an archive is essentially a library. Some of the participants in the survey expected the Archives Hub to provide them with information about published sources, such as research papers.

These findings highlight one of the issues when trying to evaluate the likely value of an innovative service: researchers do not think in the same language or with the same perspectives as information professionals. I wonder if we have a tendency to present services and interfaces modelled from our own standpoint rather than from the standpoint of the researcher.

Search Techniques and Habits

“Searches were often not particularly expansive, and participants searched for specific details which were unique to their line of enquiry” (Evaluation report). Examples include titles of women’s magazines, personal names or places. If the search returned nothing, participants might then broaden it out.

Participants said they would repeatedly return to archives or websites they were familiar with, often linked to quite niche research topics. This highlights how a positive experience with a service when it is first used may have a powerful effect over the longer term.

The survey found that online research was a priority:

“Due to conflicting pressures on time and economic resources, online searching was prevalent amongst the sample. Often research starts online and the majority is done online. Visits to see archives in person, although still seen as necessary, are carefully evaluated.”  (Evaluation report)

The main resources participants used were Google and Google Scholar (the most ubiquitous search engines used) as well as The National Archives, Google Books and ESDS. Specialist archives were referred to relating to specific search areas (e.g. The People’s History Museum, the Wellcome Library, the Mass Observation Archive).

Thoughts and Comments About the Archives Hub

All participants found the Hub easy to navigate and most found locating resources intuitive. As part of the survey we asked the participants to find certain resources, and almost all of them provided the right answers with seemingly no difficulty.

“It is clear. The descent of folders and references at the top are good for referencing/orientating oneself. The descriptions are good – they obviously can’t contain everything that could be useful to everyone and still be a summary. It is similar to other archive searches so it is clear.” (Survey response, PhD history student)

The social scientists that took part in the evaluation were less positive about the Archives Hub than the historians. Clearly many social science students are looking for datasets, and these are generally not represented on the Hub. There was a feeling that contemporary sources are not well represented, and these are often more important to researchers in fields like politics and sociology. But overall comments were very positive:

“…if anyone ever asked about how to search archives online I’d definitely point them to the Archives Hub”.

“Useful. It will save me making specific searches at universities.”

Archives Hub Content

It was interesting to see the sorts of searches participants made. A search for ‘spatial ideas’ by one participant did not yield useful results. This would not surprise many archivists – collections are generally not catalogued to draw out such concepts (neither Unesco nor UKAT have a subject heading for this; LCSH has ‘spatial analysis’). However, there may well be collections that cover a subject like this, if the researcher is prepared to dig deep enough and think about different approaches to searching. Another participant commented that “you can’t just look for the big themes”. This is the type of search that might benefit from us drawing together archive collections around themes, but this is always a very flawed approach. This is one reason that we have Features, which showcase archives around subjects but do not try to provide a ‘comprehensive’ view onto a subject.

This kind of feedback from researchers helps us to think about how to more effectively present the Archives Hub. Expectations are such an important part of researchers’ experiences. It is not possible to completely mitigate against expectations that do not match reality, but we could, for example, have a page on ‘The Archives Hub for Social Scientists’ that would at least provide those who looked at it with a better sense of what the Hub may or may not provide for them (whether anyone would read it is another matter!).

This survey, along with previous surveys we have carried out, emphasises the importance of a comprehensive service and a clear scope (“it wasn’t clear to me what subjects or organisations are covered”). However, with the nature of archives, it is very difficult to give this kind of information with any accuracy, as the collections represented are diverse and sometimes unexpected. in the end you cannot entirely draw a clear line around the scope of the Archives Hub, just like you cannot draw a clear line around the subjects represented in any one archive. The Hub also changes continuously, with new descriptions added every week. Cataloguing is not a perfect art; it can draw out key people, places, subjects and events, but it cannot hope to reflect everything about a collection, and the knowledge a researcher brings with them may help to draw out information from a collection that was not explicitly provided in the description. If a researcher is prepared to spend a bit of time searching, there is always the chance that they may stumble across sources that are new to them and potentially important:

“…another student who was mainly focused on the use of the Kremlin Archives did point out that [the Archives Hub] brought up the Walls and Glasier papers, which were new to [them]”.

Even if you provide a list of subjects, what does that really mean? Archives will not cover a subject comprehensively; they were not written with that in mind; they were created for other purposes – that is their strength in many ways – it is what makes them a rich and exciting resource, but it does not make it easy to accurately describe them for researchers. Just one series of correspondence may refer to thousands of subjects, some in passing, some more substantially, but archivists generally don’t have time to go through an entire series and draw out every concept.

If the Archives Hub included a description for every archive held at an HE institution across the UK, or for every specialist repository, what would that signify? It would be comprehensive in one sense, but in a sense that may not mean much to researchers. It would be interesting to ask researchers what they see as ‘comprehensive resources’ as it is hard to see how these could really exist, particularly when talking about unpublished sources.

Relevance of Search Results

The difficulties some participants had with the relevance of results comes back to the problem of how to catalogue resources that often cover a myriad of subjects, maybe superficially, maybe in detail; maybe from a very biased perspective. If a researcher looks for ‘social housing manchester’ then the results they get will be accurate in a sense – the machine will do its job and find collections with these terms, and there will be weighting of different fields (eg. the title will be highly weighted), but they still may not get the results they expect, because collections may not explicitly be about social housing in Manchester. The researcher needs to do a bit more work to think about what might be in the collection and whether it might be relevant. However, cataloguers are at fault to some extent. We do get descriptions sent to the Hub where the subjects listed seem inadequate or they do not seem to reflect the scope and content that has been provided. Sometimes a subject is listed but there is no sense of why it is included in the rest of the description. Sometimes a person is included in the index terms but they are not described in the content. This does not help researchers to make sense of what they see.

I do think that there are lessons here for archivists, or those who catalogue archives. I don’t think that enough thought is gives to the needs of the researcher. The inconsistent use of subject terms, for example, and the need for a description of the archive to draw out key concepts a little more clearly. Some archivists don’t see the need to add index terms, and think in terms of technologies like Google being able to search by keyword, therefore that is enough. But it isn’t enough. Researchers need more than this. They need to know what the collection is substantially about, they need to search across other collections about similar subjects. Controlled vocabulary enables this kind of exploratory searching. There is a big difference between searching for ‘nuclear disarmament’ as a keyword, which means it might exist anywhere within the description, and searching for it as a subject – a significant topic within an archive.

 

*Linking Lives Evaluation: Final Report (October 2012) by Lisa Charnock, Frank Manista, Janine Rigby and Joy Palmer

Finding and accessing archives for voluntary action history

Guest Blog by Georgina Brewis

It would not be an exaggeration to say that the history of voluntary, civic and cultural organisations has never been more popular as an academic subject in Britain. Leading historians like Brian Harrison have called attention to the importance of voluntarism as a theme in post-war British history while there has been a wave of PhD theses dealing with topics such as the voluntary hospitals, the role of disability charities in politics, the professionalization of the voluntary sector and the formation of humanitarian networks across empire.  In 2011 no less than three edited collections presenting the latest research on voluntary action history were published and several further volumes appeared in 2012 or are in press. Such new research has been strengthened and sustained by the Voluntary Action History Society and particularly its active New Researchers group.  Importantly, not all these studies are by historians, pointing to the importance of archival resources for students of political science, sociology, health studies and other disciplines. There is growing recognition that we cannot write British social history or social policy without looking at the considerable contributions of charities, voluntary groups, philanthropists, campaigners and volunteers.

So how do academic researchers track down the archives of the voluntary and community organisations they want to use? Any would-be researcher of charity needs to understand that those bodies with catalogued and accessible institutional archives – whether kept in-house or deposited elsewhere – represent only a very small minority of voluntary organisations. Unsurprisingly these tend to be the larger, better funded and longer-established groups such as the British Red Cross or the Children’s Society.  The voluntary sector in Britain is often likened to a pyramid: a very small number of organisations at the top with paid staff, regular income and office space resting on a much larger base of groups run entirely by volunteers, subsisting on small grants and donations. Voluntary sector archives may reflect this pattern, but there is no guarantee that even the largest charity will have made provision for preservation and conservation of its records (aside from the limited financial data required by the Charity Commission) let alone for cataloguing or access.

Researchers and students are advised to start with the National Register of Archives. Another useful database is DANGO, which identifies the locations of the papers of several thousand non-governmental organisations, and was put together by a team at Birmingham University, although the end of project funding means its entries and website are no longer being updated. Searching the Archives Hub will find records of voluntary groups where these are deposited at an institution contained on its database; Hull History Centre, SOAS, Birmingham University Library or the Women’s Library have all built up specialisms in this area. Perhaps there would be a way of encouraging charities with in-house collections to make the catalogues available via Archives Hub?

Archives Hub has helped me search for materials relating to small or short-lived student-run charities that may be contained within a students’ union archive or an individual’s private papers.  Although an organisation’s institutional archive may be lost or never have existed, its history can be reconstructed through accessing annual reports, correspondence and other papers held in many different repositories – as I have managed to do for the group International Student Service. It would be helpful for future researchers if it was possible to log this information somewhere.

It remains the case that many researchers will have to seek access to records by contacting an organisation or group founder directly, with variable results. This is likely to be increasingly the case given the increase in numbers of pressure groups, charities and other voluntary bodies since the 1960s. In my experience there is a range of practice from organisations which ignore or refuse requests for access with varying degrees of politeness to those that welcome you with open arms and let you sit unsupervised with the charity’s papers, free to copy, remove, deface or pour coffee all over the institutional record. Once you’ve had success accessing the records of one organisation, it may be easier to open communications with others in a related sector. Learning how to negotiate what we might call ‘informal archives’ will be a key challenge for future researchers of voluntary action. There is a need for better advice for academics, particularly students and new researchers, on the multiple ethical considerations and practical concerns that come with using informal archives. How do you track down such records? How do you reference sources? What do you do if you’re concerned about the physical state of records or what might happen to them when the group’s founder dies? How to reconcile your obligations as a historian with the fact that a particular organisation has trusted you to look at their materials?

It is also worth remembering that records relating to charitable activities can turn up in unexpected places, for example in the archives of private companies. The records of a charitable Trust or Foundation may well contain better sources about a particular charity than the organisation itself has preserved, although again there may be problems of access. There are good signs that this is changing not least through the positive examples of two funders involved with the new Campaign for Voluntary Sector Archives: the Barrow Cadbury Trust and the Diana, Princess of Wales Memorial Fund.

This new Campaign for Voluntary Sector Archives, which was launched at the House of Lords in October 2012, seeks to raise awareness of the importance of voluntary sector archives as strategic assets for governance, corporate identity, accountability and research. It maintains that caring for archives and records is actually an important aspect of the sector’s wider public benefit responsibility. Most significantly, the Campaign brings together academic researchers, custodians, creators of records and others in the voluntary sector to share expertise and resources. Together, we should be able to begin to address some of the issues and questions I’ve outlined above. Yet there is a long way to go before all voluntary organisations are convinced not only of the value of records to the current mission, but also of the value of making these accessible to researchers from a variety of disciplines. For more information contact info@voluntarysectorarchives.org.uk

The New Scholarly Record

I was lucky enough to attend the 2012 EmTACL conference in Trondheim, and this blog is based around the excellent keynote presentation by Herbert van de Sompel, which really made me think about temporal issues with the Web and how this can limit our understanding of the scholarly record.

Herbert believes that the current infrastructure for scholarly communication is not up to the job. We now have many non-traditional assets, which do not always have fixity and often have a wide range of dependencies; assets such as datasets, blogs, software, videos, slides which may form part of a scholarly resource. Everything is much more dynamic than it used to be. ‘Research objects’ often include assets that are interdependent with each other, so they need to be available all together for the object to be complete. But this is complicated by the fact that many of them are ‘in motion’ and updated over time.

This idea of dynamic resources that are in flux, constantly being updated, is very relevant for archivists, partly because we need to understand how archives are not static and fixed in time, and partly because we need to be aware of the challenges of archiving ever more complex and interconnected resources. It is useful to understand the research environment and the way technology influences outputs and influences what is possible for future research.

There are examples of innovative services that are responding to the opportunities of dynamic resources. One that Herbert mentioned was PLOS, which publishes open scholarly articles. It puts publications into Wikipedia as well as keeping the ‘static’ copy, so that the articles have a kind of second life where they continue to evolve as well as being kept as they were at the time of submission. For example, ‘Circular Permutation in Proteins‘.

The idea of executable papers is starting to become established – papers that are not just to read but to interact with. These contain access to the primary data with capabilities to re-execute algorithms and even capabilities to allow researchers to upload and use their own data. It produces a complex interdependency and produces a challenge for archiving because if something is not fixed in time, what does that mean for retaining access to it over time?

This all raises the issue of what the scholarly record actually is. Where does it start? Where does it end? We are no longer talking about a bunch of static files but a dynamic interconnected resource. In fact, there is an increasing sense that the article itself is not necessarily the key output, but rather it is the advertising for the actual scholarship.

Herbert concluded from this that it becomes very important to be able to view different points in time in the evolution of scholarly record, and this should be done in a way that works with the Web. The Web is the platform, the infrastructure for the scholarly record.  Scholarly communication then becomes native to the Web. At the heart of this is the need to use HTTP URIs.

However, where are we at the moment? The current archival infrastructure for scholarly outputs deals with things with fixity and boundaries. It cannot deal with things in flux and with inter-dependencies. The Web exists in ‘now’ time; it does not have a built in notion of time. It assumes that you want the current version of something – you cannot use a URI to get to a prior version.

Diagram to show publication on the Web
Slide from Herbert van de Sompel’s presentation showing the publication context on the Web

We don’t really object to this limitation, something evidenced by the fact that we generally accept links that take us to 404 pages, as if it is just an inevitable inconvenience. Maybe many people just don’t think that there is any real interest in or requirement for ‘obsolete’ resources, and what is current is what is important on the Web.

Of course, there is the Internet Archive and other similar initiatives in Web archiving, but they are not integrated into the Web. You have to go somewhere completely different in order to search for older copies of resources.

If the research paper remains the same, but resources that are an integral part of it change over time, then we need to change archiving to reflect this. We need to think about how to reference assets over time and how to recreate older versions. Otherwise, we access the current version, but we are not getting the context that was there at the time of creation; we are getting something different.

Can we recreate a version of a scholarly record? Can we go back to certain point it time so we can see linked assets from a paper as they were at the time of publication? At the moment we are likely to get many 404s when we try to access links associated with a publication. Herbert showed one survey on the decay of URLs in Medline, which is about 10% per year, especially with links to thinks like related databases.

One solution to this is to be able to follow a URI in time – to be able to click on URI and say ‘I want to see this as was 2 years ago’.  Herbert went on to talk about something he has created called Memento. Memento aims to better integrate the current and past Web. It allows you to select a day or time in the browser and effectively take the URI back in time. Currently, the team are looking at enabling people to browse past pages of Wikipedia. Memento has a fairly good success rate with going back to retrieve old versions, although it will not work for all resources. I tried it with the Archives Hub and found it easy to take the website back to how it looked right in the very early days.

Screen shot of the Archives Hub hompeage
Using Memento to take the Archives Hub back in time.

One issue is that the archived copies are not always created near the time of publication. But for those that are, they are created simply as part of the normal activity of the Web, by services like the Internet Archive or the British Library, so there is no extra work involved.

Herbert outlined some of the issues with using DOIs (digital object identifiers), which provide identifiers for resources that use a resolver to ensure that the identifier can remain the same over time. This is useful if, for example, a publisher is bought out – the identifier is still the same as the resolver redirects to the right location However, a DOI resolver exists in the perpetual now. It is not possible to travel back in time using HTTP URIs. This is maybe one illustration of the way some of the processes that we have implemented over the Web do not really fulfil our current needs, as things change and resources become more complex and dynamic.

With Memento, the same HTTP URI can function as the reference to temporally evolving resources. The importance of this type of functionality is becoming more recognised. There is a new experimental URI scheme, DURI , or Dated URI. The ideas is that a URI, such as http://www.ntnu.no, can be dated: 1997-06-17:http://www.ntnu.no (this is an example and is not actionable now). Herbert did raise another possibly of developing Websites that can deal with the TEL (telephone) protocol. The idea would be that the browser asks you whether the Website can use the TEL protocol, and if it can, you get this option offered to you. You can then use this and reference a resource and use Memento to go back in time.

Herbert concluded that the idea of ‘archiving’ should not be just a one-off event, but needs to happen continually. In fact, it could happen whenever there is an interaction. Also, when new materials are taken into a repository, you could scan for links and put them into an archive, so the links don’t die. If you archive the links at the time of publication or when materials submitted to a repository, then you protect against losing the context of the resource.

Herbert introduced us to SiteStory, which offers transactional archiving of a a web server. Usually a web archive sends out a robot, gathers and dumps the data. With SiteStory the web server takes an active part. Every time a user requests a page it is also pushed back into the archive, so you get a fine grained history of the resource. Something like this could be done by publishers/service providers, with the idea that they hold onto the hits, the impact, the audience. It certainly does seem to be a growing area of interest.

Herbert’s slides are available on Slideshare.