Online Survey Results (2011)

We would like to share some of the results of our annual online survey, which we run each year, over a 3-4 week period. We aim for about 100 responses (though obviously more would be very welcome!), and for this survey we got 92 responses. We create a pop-up invitation to fill out the survey – something we do not like to do, but we do feel that it attracts more responses than a simple link.

Context

We have a number of questions that are replicated in surveys run for Zetoc and Copac, two bibliographic JISC-funded Mimas services, and this provides a means to help us (and our funders) look at all three services together and compare patterns of use and types of user.

This year we added four questions specifically designed to help us with understanding users of the Hub and to help us plan our priorities.

We aim to keep the number of questions down to about 12 at the most, and ensure that the survey will take no longer than 10 minutes to complete. But we also want to provide the opportunity for people to spend longer and give more feedback if they wish, so we combine tick lists and radio boxes with free text comments boxes.

We take the opportunity to ask whether participants would be willing to provide more feedback for us, and if they are potentially willing, they provide their email address. This gives us the opportunity to ask them to provide more feedback, maybe by being part of a focus group.

Results of the Survey

Profile

  • The vast majority of respondents (80%) are based in the UK for their study and/or work.
  • Most respondents are in the higher education sector (60%). A substantial number are in the Government sector and also the heritage/museum sector.
  • 20% of those using the Hub are students – maybe less than we would hope, but a significant number.
  • 10% are academics – again, less than we would hope, but it may be that academics are less willing to fill in a survey.
  • 50% are archivists or other information professionals. This is a high number, but it is important to note that it includes use of the Hub on behalf of researchers, to answer their enquiries, so it could be said to represent indirect use by researchers.
  • The majority of respondents use the service once or twice a month, although usage patterns were spread over all options, from daily to less than once a month, and it is difficult to draw conclusions from this, as just one visit to the Hub website may prove invaluable for research.

graph showing value of the HubUse and Recommendation

  • A significant percentage – 26% – find the Hub ‘neither easy nor difficult’ to use, and 3% of the respondents found it difficult to use, indicating that we still need to work on improving usability (although note that a number of comments were positive about ease of use) .
  • 73% agree their work would take longer without the Hub, which is a very positive result and shows how important it is to be able to cross-search archives in this way.
  • A huge majority – 93% – would recommend the Hub to others, which is very important for us. We aim to achieve 90% positive in this response, as we believe that recommendations are a very important means for the Hub to become more widely known.

Subject Areas

We spent a significant amount of time creating a list of subjects that would give us a good indication of disciplines in which people might use the Hub. The results were:

    • History 47
    • Library & Archive Studies 33
    • English Literature 17
    • Creative & Performing Arts 16
    • Education & Research Methods 10
    • Predominantly Interdisciplinary 9
    • Geography & Environment 5
    • Political Studies & International Affairs 5
    • Modern Languages and Linguistics 4
    • Physical Sciences 4
    • Special Collections 4
    • Architecture & Planning 3
    • Biological & Natural Sciences 3
    • Communication & Media Studies 3
    • Medicine 3
    • Theology & Philosophy 3
    • Archaeology 2
    • Engineering 2
    • Psychology & Sociology 2
    • Agriculture 1
    • Law 1
    • Mathematics 1
    • Business & Management Studies 0
  • History is, not surprisingly, the most common discipline, but literature, the arts, education and also interdisciplinary work all feature highly.
  • There is a reasonable amount of use from the subjects that might be deemed to have less call for archives, showing that we should continue to promote the Hub in these areas and that archives are used in disciplines where they do not have a high profile. It would be very valuable to explore this further.

graph showing use of archival websites

  • The Hub is often used along with other archival websites, particularly The National Archives and individual record office websites, but a significant number do not use the websites listed, so we cannot assume prior knowledge of archives.
  • It would be interesting to know more about patterns of use. Do researchers try different websites, and in what order to they visit them? Do they have a sense of what the different sites offer?
  • There is still low use of the European aggregators, Europeana and APENet, although at present UK archives are not well represented on these services and arguably they do not have a high profile amongst researchers (the Hub is not yet represented on these aggregators).

Subsequent activities

  • It is interesting to note that 32% visit a record office as a result of using the Hub, but 68% do not. It would be useful to explore this further, to understand whether the use of the Hub is in itself enough for some researchers. We do know that for some people, the description holds valuable information in and of itself, but we don’t know whether the need to visit a record office, maybe some distance away, prevents use of the archives when they might be of value to the researcher.

What is of most value?

  • We asked about what is important to researchers, looking at key areas for us. The results show that comprehensive coverage still tops the polls, but detailed descriptions also continue to be very important to researchers, somewhat in opposition tograph showing what is most valuable to researchers the idea of the ‘quick and dirty’ approach. More sophisticated questioning might draw out how useful basic descriptions are compared with no description and what sort of level of detail is acceptable.
  • Links to digital content and information on related material are important, but not as important as adding more descriptions and providing a level of detail that enables researchers to effectively assess archives.
  • Searching across other cultural heritage resources at the same time is maybe surprisingly less of a priority than content and links. It is often assumed that researchers want as much diverse information as possible in a ‘one-stop shop’ approach, but maybe the issues with things like the usability of the search,  navigation, number of results and relevance ranking of results illustrate one of the main issues – creating a site that holds descriptions and links to very varied content and still ensuring it is very easily understandable and researchers know what they are getting.
  • The regional search was not a high priority but a significant medium priority, and it might be argued that not all researchers would be interested in this, but some would find it particularly useful, and many archivists would certainly find it helpful in their work
  • We provided a free text box for participants to say what they most valued. The ability to search across descriptions, which is the most basic value proposition of the Hub, came out top, and breadth of coverage was also popular, and could be said to be part of the same selling point.
  • It was interesting to see that some respondents cited the EAD Editor as the main strength for them, showing how important it is to provide ways for archivists to create descriptions (it may be thought that other means are at their disposal, but often this is not the case).
  • Six people referred to the importance of the Hub for providing an online presence, indicating that for some record offices, the Hub is still the only way that collections are surfaced on the Web.

What would most improve the Hub?

  • We had a diversity of responses to the question about what would most improve the Hub, maybe indicating that there are no very obvious weaknesses, which is a good thing. But this does make it difficult for us to take anything constructive from the answers, because we cannot tell whether there is a real need for a change to be made. However, there were a few answers that focused on the interface design, and some of these issues should be addressed by our new ‘utility bar’ which is a means to more clearly separate the description from the other functions that users can then perform, and should be implemented in the next six months.

Conclusions

The survey did not throw up anything unexpected, so it has not materially affected our plans for development of the Hub. But it is essentially an endorsement of what we are doing, which is very positive for us. It emphasised the importance of comprehensive coverage, which is something we are prioritising, and the value of detailed descriptions, which we facilitate through the EAD Editor and our training opportunities and online documentation. Please contact us if you would like to know more.

More Product, Less Processing?

I’ve been reading a fascinating article by Mark A. Greene and Dennis Meissner, ‘More Product Less Process: Revamping Traditional Archival Processing‘ (PDF). I wanted to offer a summary of the article.

image of scalesThe essence of this article is that archivists spend too long processing collections (appraising, cataloguing and carrying out minor preservation). This approach is not working; the cataloguing backlog continues to increase. We are too conservative, cautious and set in our ways, and we need to think about a new approach to cataloguing that is more pragmatic and user-focussed. The article was written by archivists in the USA, but would seem to apply to archives here in the UK, where we know that the backlog is a continuing problem.

I think the article makes the argument well and with a good deal of conviction. The bottom line is that we must rethink our approach unless we are to continue to accrue backlogs and deny researchers access to hugely valuable primary source material.

However, there are arguments in support of detailed cataloguing. For digital archives it is extremely useful to provide metadata at the item level,  enabling such useful resources as http://archiveshub.ac.uk/data/gb1837des-dca?page=3#id634580. With this detailed list, researches can see digital resources described and then access them directly. It could be argued that if a collection is to be digitised, providing this sort of level of metadata is appropriate, and in general it is the more valuable and highly used collections that are digitised. But for born-digital collections, this level of detail would be totally unsustainable.

Also, I wonder if the work that volunteers do should be taken into account – they may be able to help us catalogue in more detail, whilst trained archivists continue to create the main collection or series-level descriptions. I remember a whole band of NADFAS volunteers cataloguing photographs where I used to work. Furthermore, I was speaking to an archivist recently who said that they had taken the time to weed out duplicates (something this report criticises)…and then sold them on eBay for a tidy profit, that helped them fund their very under-resourced archive (they had the rights to do this!). So, maybe there are factors to take into consideration that support a detailed approach, but I think a bold approach to examining this whole area in UK archives would be very welcome.

Some of the points made in the report:

  • Archivists spend too much time cataloguing, not necessarily doing what is necessary. We think in terms of an ideal that we have to reach, although we haven’t actually articulated what this ideal is, and really examined it.
  • We are too attached to old-fashioned ways of doing things, which worked when we had smaller collections to deal with, but are not appropriate for large 20th century collections.
  • We give a higher priority to serving the needs of our collections rather than the needs of our users.
  • We need a new set of guidelines that focus on what we absolutely need to do.
  • We need to discuss, debate and examine our approach to cataloguing, and not be defensive about our roles.
  • We tend to arrange collections down to item level. In particular, we carry out preservation activities to this level. We accept the premise that basic preservation steps necessitate an item-level approach.
  • We often remove all metal fastenings and put materials into acid-free folders. So, even if we do not describe collections down to item level (maybe we just describe at collection or series level), we go down to this level of detail in our preservation activities.  Yet, with good climate control, metal fasteners should not rust, and as yet we do not have strong evidence of a detrimental effect of standard manila folders if the materials is stored in a controlled environment.
  • We often weed out duplicates throughout a collection, which requires processing down to item level. Is this really worth doing?
  • The various sources of advice about the level of detail we process archives to are inconsistent. Some sources advocate description to series level, but preservation activities to item level. NARA advocates preservation in accordance with intrinsic value and anticipated use, so, for example, new folders should only be used if current ones are damaged, and metal fasteners should be removed only if ‘appropriate’ – meaning where they are causing obvious damage.
  • We seem to believe that we need to aspire to ‘a substantial, multi-layered, descriptive finding aid,’ a reflection of ‘slow, careful scholarly research’.  But in reality, maybe we should adopt a more flexible approach, taking each collection in turn on its merits. Some may justify detailed cataloguing, but many do not.
  • We should take the position that users come to do research, and that we do not have to do this for them in advance.
  • We should ‘get beyond our absurd over-cautiousness’ about providing access to unprocessed collections, and make them available unless there are good legal or preservation reasons to restrict access or the collection is of extremely high value.
  • We have very inadequate processing metrics. Attempts to quantify processing expectations have resulted in wildly differing figures. Figures given in various studies include 3, 6.9, 8, 12.7 and 10.6 hours per cubic foot. Other studies have come up with between 3 and 5.5 days per foot.
  • One major study  by an archive centre revealed 15.1 hours were spent on each cubic foot, far more than the value that was placed upon  what was accomplished. The study gave ‘an improved sense of the real and total costs involved’.
  • The Greene/Meissner study looked at various projects funded by NHPRC grants (National Historical Publications & Records Committee), and found an average productivity figure of 9 hours per foot, but with highs of around 67 hours per foot.  It also conducted an email survey and found expectations of processing times averaged at 14.8 hours, although there was a high of 250 hours!
  • Grant funding often encourages an item-level focus, rather than helping us to really tackle our substantial backlogs. There should be more of a requirement to justify meticulous processing – it should only be for exceptional collections.
  • The study recommends aiming for a processing rate of 4 hours per cubic foot for most large 20th century collections, using a series-level approach for description and preservation.
  • Studies show a lack of standardisation, not only in our definitions but also around the levels of arrangement, preservation and access that are useful and necessary.  We do not have proper administrative controls over this work. We tend to argue for each of us having a unique situation, that does not allow for comparison, and we do not have a common sense of acceptibile policies and procedures.
  • Whilst we continue to process to item level, a substantial number do not make catalogues available through OPACs or Websites, arguably prioritising processing over user needs.

The report concludes that maybe we should recognise that ‘the use of archival records…is the ultimate purpose of identification and administration.’ (SAA, Planning for the Archival Profession, 1986).  Maybe we should agree that a collection is catalogued if it ‘can be used productively for research.’ And maybe we should be willing to take a different approach for each collection, making choices and setting priorities, rather than being too caught up in a ‘love of craftmanship’ that could be seen as fastidiousness that does not truly serve the user.

The question seems to be how much would be lost by putting speed of processing before careful examination of all documents in a collection.  Maybe this does require defining good cataloguing? Maybe we believe that our professional standing is tied up with undertaking detailed cataloguing…more so than the ever increasing growth of backlogs, where the papers are entirely unaccessible to researchers?

Greene and Meissner state that there should be a ‘golden minimum’ for processing, where we adequately address user needs and only go beyond this where there are demonstrable business reasons. They also believe that arrangement, description and preservation should all occur at the same level of detail, again, unless there are good reasons to deviate from this.

What do you think…?

HubbuB: November 2011

image showing celebratory 200 I don’t think we made much of a fuss about reaching 200 contributors, but we’re really pleased to say that we’re now into the 200’s and new contributors are coming on board regularly, which makes the Hub even more useful to even more researchers.

We’re currently trying out a bit of a whizzy thing with the contributors’ map – go to http://archiveshub.ac.uk/contributorsmap/ and try a few clicks and you’ll see what I mean. We particularly like the jump from Aberdeen to Exeter, and are looking for archives from further afield in order to execute even bigger jumps!

Speaking of contributors, we’ve made a few changes to our contributor pages. We now have a link to browse each contributor’s descriptions, and also a link to simply show the list of collections. This link was largely introduced to help us with our quest to bring the Hub out loud and strong through Google. We’re doing pretty well on that front….we’ve found that page views have gone up radically over the last few months, and that can only be good for archives.  I think the list of descriptions can really look quite impressive – I tried Aberdeen and found collections from ‘favourite tunes’ to ‘a valuation of the Shire of Aberdeen’.

We’ve been busy on our new Linking Lives project, using Linked Data to create a Web front-end, and making the data available via an open licence. We’re really pleased that the vast majority of contributors have not asked us to exclude their descriptions, and many have emailed specifically to endorse what we are doing.  This is brilliant news, and I think it shows that most archivists are actually forward-thinking and understand that technology can really benefit our domain (flattery will get you everywhere!).  We want to ensure that archives are out there in the Web of Data, and part of the innovative work that is happening now. You may have seen a few blog posts to get going on Linking Lives: http://archiveshub.ac.uk/linkinglives/. Pete’s are rather more technical than mine, and brilliantly set out some of the difficult issues. I’m trying to think about what archivists are interested in and how we think about archival context. I hope our posts on licensing convey how much we are thinking about the best way to present and attribute the content.

Lastly for this month’s HubbuB, I’ve knocked up a fairly short Feature on the latest stuff that’s happening. I’m thinking of this as an annual feature – sometimes we are so busy we kind of forget to actually make a bit of noise about what we’ve achieved. You’ll see that we’re working on some record display improvements. I really hope I can show you these soon.

Blowing the dust off Special Collections

Guest Blog Post by John Hodgson

Mimas works on exciting and innovative projects all the time and we wanted Hub blog readers to find out more about the SCARLET project, where Mimas staff, academics from the University of Manchester and the archive team at John Rylands University Library are exploring how Augmented Reality can bring resources held in special collections to life by surrounding original materials with digital online content.

The Project

Special Collections using Augmented Reality to Enhance Learning and Teaching (SCARLET)

SCARLET addresses one of the principal obstacles to the use of Special Collections in teaching and learning – the fact that students must consult rare books, manuscripts and archives within the controlled conditions of library study rooms. The material is isolated from the secondary, supporting materials and the growing mass of related digital assets. This is an alien experience for students familiar with an information-rich, connected wireless world, and is a barrier to their use of Special Collections.

The SCARLET project will provide a model that other Special Collections libraries can follow, making these resources accessible for research, teaching and learning. If you are interested in creating similar ‘apps’ and using the toolkit created by the team then please get in touch.

SCARLET Blog: http://teamscarlet.wordpress.com/

SCARLET Twitter: twitter.com/team_scarlet

The Blog Post

Blowing the dust off Special Collections

The academic year is now in full swing and JRUL Special Collections staff are busy delivering ‘close-up’ sessions and seminars for undergraduate and postgraduate students.

A close-up session typically involves a curator and an academic selecting up to a dozen items to show to a group of students. The items are generally set out on tables and everyone gathers round for a discussion. It is a real thrill for students to see Special Collections materials up close, and in some circumstances to handle the items themselves. The material might be papyri from Greco-Roman Egypt, medieval manuscripts, early printed books, eighteenth-century diaries and letters, or modern literary archives: the range of our Special Collections is vast.

Dante Seminar

Dr Guyda Armstrong shows her students a selection of early printed editions of Dante.

From our point of view, it’s really rewarding and enlightening to work alongside enthusiastic teachers such as Guyda Armstrong, Roberta Mazza and Jerome de Groot. The ideal scenario is a close partnership between the academic and the curator. Curators know the collections well, and we can discuss with students the materiality of texts, technical aspects of books and manuscripts, the context in which texts and images were originally produced, and the afterlife of objects – the often circuitous routes by which they have ended up in the Rylands Library. Academics bring to the table their incredible subject knowledge and their pedagogical expertise. Sparks can fly, especially when students challenge what they are being told!

This week I have been involved in close-up sessions for Roberta Mazza’s ‘Egypt in the Graeco-Roman World’ third-year Classics course, and Guyda Armstrong’s ‘Beyond the Text’ course on Dante, again for third-year undergraduates. Both sessions were really enjoyable, because the students engaged deeply with the material and asked lots of questions. But the sessions also reinforced my belief that Augmented Reality will allow us to do so much more. AR will make the sessions more interactive, moving towards an enquiry-based learning model, where we set students real questions to solve, through a combination of close study of the original material, and downloading metadata, images and secondary reading, to help them interrogate and interpret the material. Already Dr Guyda Armstrong’s students have had a sneak preview of the Dante app, and I’m look forward to taking part in the first trials of the app in a real teaching session at Deansgate in a few weeks’ time.

For many years Special Collections have been seen by some as fusty and dusty. AR allows us to bring them into the age of app.

HubbuB: October 2011

Europeana and APENet

Europeana LogoI have just come back from the Europeana Tech conference, a 2 day event on various aspects of Europeana’s work and on related topics to do with data. The big theme was ‘open, open, open’, as well, of course, as the benefits of a European portal for cultural heritage.  I was interested to hear about Europeana’s Linked Data output, but my understanding is that at present, we cannot effectively link to their data, because they don’t provide URIs  for concepts. In other words, identifiers for names such as http://data.archiveshub.ac.uk/doc/agent/gb97/georgebernardshaw, so that we can say, for example, that our ‘George Bernard Shaw’ is the same as ‘George Bernard Shaw’ represented on Europeana.

I am starting to think about the Hub being part of APENet and Europeana. APENet is the archival aggregator for Europe. I have been in touch with them about the possibility of contributing our data, and if the Hub was to contribute, we could probably start from next year. Europeana only provide metadata for digital content, so we could only supply descriptions where the user can link to the digital content, but this may well be worth doing, as a means to promote the collections of any Hub contributors who do link to digital materials.

If you are a contributor, or potential contributor, we would like to know what you think…. we have a quick question for you at http://polldaddy.com/poll/5565396/. It simply asks if you think its a good idea to be part of these European initiatives. We’d love to get your views, and you only have to leave your name and a comment if you want to.

Flickr: an easy way to provide images online

You will be aware that contributors can now add images to descriptions and links to digital content of all kinds. The idea is that the digital content then forms an integral whole with the metadata, and it is also interoperable with other systems.

I’ve just seen an announcement by the University of Northampton, who have recently added materials to Flickr . I know that many contributors struggle to get server space to put their digital content online, so this is one possible option, and of course it does reach a huge number of people this way. There may be risks associated with the persistence of the URIs for the images, but then that is the case wherever you put them.

On the Hub we now have a number of images and links to content, for example: http://archiveshub.ac.uk/data/gb1089ukc-joh, http://archiveshub.ac.uk/data/gb1089ukc-bigwood, http://archiveshub.ac.uk/data/gb1089ukc-wea, http://archiveshub.ac.uk/data/gb141boda?page=7#boda.03.03.02.

Ideally, contributors would supply digital content at item level, so the metadata is directly about the image/digital content, but it is fine to provide it at any level that is appropriate.  The EAD Editor makes adding links easy (http://archiveshub.ac.uk/dao/). If you aren’t sure what to do, please do email us.

Preferred Citation

We never had the field for the preferred citation in our old template for the creation of EAD, and it has not been in the EAD Editor up till now. We were prompted to think about this after seeing the results of a survey on the use of EAD fields presented at the Society of American Archivists conference. Around 80% of archive institutions do use it. We think it’s important to advise people how to cite the archive, so we are planning to provide this in the Editor and may be able to carry out global edits to add this to contributors’ data.

List of Contributors

Our list of contributors within the main search page has now been revised, and we hope it looks substantially more sensible, and that it is better for researchers. This process really reminded us how hard it is to come up with one order for institutions that works for everyone!  We are currently working on a regional search, something that will act as an alternative way to limit searching. We hope to introduce this next year.

And finally…A very engaging Linked Data interface

This interface demonstration by Tim Sherratt shows how something driven by Linked Data can really be very effective. It also uses some of the Archives Hub vocabulary from our own Linked Data work, which is a nice indication of how people have taken notice of what we have been doing. There is a great blog post about it by Pete Johnston, Storytelling, archives and Linked Data. I agree with Pete that this sort of work is so exciting, and really shows the potential of the Linked Data Web for enabling individual and collective storytelling…something we, as archivists, really must be a part of.

Features

German advert© National Fairground Archive, University of Sheffield

The Archives Hub has been writing/having collections of the month or features since 2001. In that time we’ve had a large variety of features on everything from ornithology to poetry to the Miners’ Strike and even Rugby League.

Our features highlight what treasures there are to be found in archive collections that are on the Hub. Sometimes the feature can be on a specific topic or theme collecting resources together from different repositories or they can highlight a specific repository.

This year we have changed the format of our features to include print resources from our sister service, Copac and there are now links from the Copac home page to the feature.

All of our web pages include Google analytics and we can see that our features are popular. Our feature pages have been viewed by nearly 9000 people since 1 January 2011 and most viewed  feature this year has been our feature: Scrum, ruck and tackle: the Rugby Football League Archive at the University of Huddersfield. Having your collections featured on the Hub also increases the amount of traffic you’ll get to your descriptions through Google.

Although the Hub team has been known to write a feature or two, we much prefer it if our contributors write the features, after all, they are the experts on their collections. This year has been a bumper year for features, with features from the University of Huddersfield, Imperial War Museum, the Women’s Library and the National Fairground Archive to name but a few. We have features scheduled now for the rest of 2011 and even have a couple of months booked up in 2012.

We like to be as flexible as possible when it comes to our features and offer to help as much or as little as the contributor wants. As a contributor, you can simply write the text of the feature and provide images, or you can suggest related collections, websites and reading lists as well. It’s entirely up to you.

Should you wish to feature on the Archives Hub, please contact archiveshub@mimas.ac.uk. We operate on a first come first served basis, so if you have an event, exhibition or project launch coming up and you would like your feature to coincide with it, let us know as early as possible.

Huddersfield Giants’ Match © Image courtesy of the Rugby Football League and The University of Huddersfield Archive and Special Collections

HubbuB: September 2011

APEnet & Europeana

You  may be aware of the Archives Portal Europe – http://www.archivesportaleurope.eu. We’ve been considering whether the Hub should be part of this and I would welcome any thoughts that you have about it, as it would be your archives that would be represented. I don’t think the Website offers the best navigation or user interface at the moment, and the coverage is very very patchy. But should we be supporting the principle of a European-wide archives portal, and looking to be part of it? I know they are planning on a great deal more development work, and they are interested in the Hub joining in 2012. We are generally keen here at the Hub to do all we can to promote your collections, and enable connections to be made with other materials, and whilst very ambitious, projects like APENet take this idea to a whole new level.

Similarly, we are looking at what Europeana are doing, and I will be attending the Europeana Tech conference in October (http://www.europeanaconnect.eu/europeanatech/)  – a blog post will follow with some reflections on the conference and on the significance of Europeana. At present, our main aim is to stay abreast of what is happening and look at the sort of commitment being a part of it would involve.

New contributors

The more contributors the Hub has, the more valuable it becomes as a cross-searching tool for researchers, helping them to discovery the great archives that are out there. Our latest contributors are Cambridge University: Sedgwick Museum of Earth Sciences, St Pauls Cathedral, Oxford Brookes Special Collections, Victoria & Albert Museum Theatre & Performance, Islington Local History Centre, Glasgow Women’s Library, Royal Scottish Academy of Music and Drama. We are very close now to our 200th contributor!

SNAC project for name authorities

The Social Networks in Archival Context project has been very successfully taking EAD descriptions and creating EAC-CPF authority files, working to disambiguate and pattern-match in order to create a set of name authorities that we can all use and benefit from. I recommend taking a look at their website: http://socialarchive.iath.virginia.edu/ and in particular the demonstrator: http://socialarchive.iath.virginia.edu/xtf/search. Search or click on a record and try the new RGraph demonstrator to see a prototype visualisation – it shows the sorts of new ways of looking at data that we have the opportunity to create.

The project have agreed in principle to take Hub description, and create authority records. I’d love to hear your thoughts on this. As yet, of course, the Hub does not display authority records, but this is something we need to work on. We will also be looking at how this fits into our new Linking Lives project, part of our Locah work (http://archiveshub.ac.uk/blog/?p=2699). I’ll try to knock up a blog post that outlines what the SNAC project is doing and how we might fit into it.

Hub Feature

This month we’re pleased to say that we have a feature about the Mary Hamilton Papers, held at John Rylands Library, The University of Manchester: “Courtier, diarist and bluestocking, her papers offer a veritable cornucopia of information on royal, aristocratic, artistic and literary circles during the late 18th and early 19th centuries.” http://archiveshub.ac.uk/features/maryhamilton/index.html

HubbuB is a monthly newsletter aimed primarily at Archives Hub contributors and archives professionals.

Locah Linking Lives: an introduction

We are very pleased to announce that the Archives Hub will be working on a new Linked Data project over the next 11 months, following on from our first phase of Locah, and called Linking Lives. We’d like to introduce this with a brief overview of the benefits of the Linked Data approach, and an outline of what the project is looking to achieve. For more in-depth discussion of Linked Data, please see our Locah project blog, or take a look at the Linked Data website guides and tutorials.

Linked Open Data Cloud
Linked Data Cloud

The benefits of Linked Data

The W3C currently has a draft of a report, ‘Library Linked Data‘, which covers archives and museums. In this they state that:

‘Linked data is shareable, extensible, and easily re-usable… These characteristics are inherent in the linked data standards and are supported by the use of web-friendly identifiers for data and concepts.’

Shareable

One of the exciting things about Linked Data is that it is about sharing data (certainly where you have Linked Open Data). I have found that this emphasis on sharing and data integration has actually had a positive effect aside from the practical reality of sharing; it engenders a mindset of collaboration and sharing, something that is of great value not just in the pursuit of the Linked Data vision, but also, more broadly, for any kind of collaborative effort and for encouraging a supportive environment. Our previous Linked Data project, Locah, has been great for forging contacts and for putting archival data within this exciting space where people are talking about the future of the Web and the sorts of things that we might be able to do if we work together.

For the Archives Hub, our aim is to share the descriptions of archive collections as a means to raise the profile of archives, and show just how relevant archives are across numerous disciplines and for numerous purposes. In many ways, sharing the data gives us an opportunity to get away from the idea that archives are only of interest to a narrow group of people (i.e. family historians and academics purely within the History Faculty).

Extensible

The principle of allowing for future growth and development seems to me to be vital. The idea is to ensure that we can take a flexible approach, whereby enhancements can be made over time. This is vital for an exploratory area like Linked Data, where an iterative approach to development is the best way to go, and where we are looking at presenting data in new ways, looking to respond to user needs, and working with what technology offers.

Reusable

‘Reuse’ has become a real buzz word, and is seen as synonymous with efficiency and flexibility.  In this context it is about using data in different contexts, for different purposes. In a Linked Data environment what this can mean is providing the means for people to combine data from different sources to create something new, something that answers a certain need. To many archivists this will be fine, but some may question the implications in terms of whether the provenance of the data is lost and what this might mean. What about if information from an archive description is combined with information from Wikipedia? Does this have any implications for the idea of archive repositories being trusted and does it mean that pieces of the information will be out of their original context and therefore in some way open to misuse or misinterpretation?

Reuse may throw up issues, but it provides a great deal more benefits than risks. Whatever the caveats, it is an inevitable consequence of open data combined with technology, so archives either join in or exclude themselves from this type of free-flow of data. Nevertheless, it is certainly worth thinking about the issues involved in providing data within different contexts, and our project will consider issues around provenance.

Linking Lives

The basic idea of the Linking Lives project is to develop a new Web interface that presents useful resources relating to individual people, and potentially organisations as well.

It is about more than just looking at a name-based approach for archives. It is also about utilising external datasets in order to bring archives together with other data sources. Researchers are often interested in individual people or organisations, and will want to know what sort of resources are out there. They may not just be interested in archives. Indeed, they may not really have thought about using archives, but they may be very interested in biographical information, known connections, events during a person’s lifetime, etc. The idea is to show that various sources exist, including archives, and thus to work towards broadening the user-base of archives.

Our interface will bring different data sources together and we will link to all the archival collections relating to an individual. We have many ideas about what we can do, but with limited time and resources, we will have to prioritise, test out various options and see what works and what doesn’t and what each option requires to implement. We’ll be updating you via the blog, and we are very interested in any thoughts that you have about the work, so please do leave comments, or contact us directly.

In some ways, our approach may be an alternative to using EAC-CPF (Encoded Archvial Content for Corporate Bodies, Persons and Families, an XML standard for marking up names associated with archive collections). But maybe in essence it will compliment EAC-CPF, because eventually we could use EAC authority records and create Linked Data from them. We have followed the SNAC project with interest, and we recently met up with some of the SNAC project members at the Society of American Archivists’ Conference. We hope to take advantage of some of the exciting work that they are doing to match and normalise name records.

The W3C draft report on Library Linked Data states that ‘through rich linkages with complementary data from trusted sources, libraries [and archives] can increase the value of their own data beyond the sum of its sources taken individually’. This is one of the main principles that we would like to explore. By providing an interface that is designed for researchers, we will be able to test out the benefits of the Linked Data approach in a much more realistic way.

Maybe we are at a bit of a crossroads with Linked Data. A large number of data sets have been put out as XML RDF, and some great work has been done by people like the BBC (e.g. the Wildlife Finder), the University of Southampton and the various JISC-funded projects. We have data.gov.uk, making Government data sets much more open. But there is still a need to make a convincing argument that Linked Data really will provide concrete benefits to the end user. Talking about Sparql endpoints, JSON, Turtle, Triples that connect entities and the benefits of persistent URIs won’t convince people who are not really interested in process and principles, but just want to see the benefits for themselves.

Has there been too much emphasis on the idea that if we output Linked Data then other people can (will) build tools? The much quoted adage is ‘The best thing that will be done with your data will be done by someone else’, but is there a risk in relying on this idea? In order to get buy-in to Linked Data, we need visible, concrete examples of benefit. Yes, people can build tools, they can combine data for their own purposes, and that’s great if and when it happens; but for a community like the archives community the problem may be that this won’t happen very rapidly, or on the sort of scale that we need to prove the worth of the investment in Linked Data. Maybe we are going to have to come up with some exemplars that show the sorts of benefits that end users can get. We hope that Linking Lives will be one step along this road; not so much an exemplar as a practical addition to the Archives Hub service that researchers will immediately be able to benefit from.

Of course, there has already been very good work within the library, archive and museum communities, and readers of this blog may be interested in the Use Cases that are provided at http://www.w3.org/2005/Incubator/lld/wiki/UseCaseReport

Just to finish with a quote from the W3C draft report (I’ve taken the liberty of adding ‘archives’ as I think they can readily be substituted):

“Linked Data reaches a diverse community far broader than the library/archive community; moving to library/archival Linked Data requires libraries/archives to understand and interact with the entire information community. Much of this information community has been engendered by the capabilities provided by new technologies. The library/archive community has not fully engaged with these new information communities, yet the success of Linked Data will require libraries/archives to interact with them as fully as they interact with other libraries/archives today. This will be a huge cultural change that must be addressed.”

photo of paper chain dolls
Flickr: Icono SVDs photostream, http://www.flickr.com/photos/28860201@N05/with/3674610629/

Archives Wales

map of wales with archivesI recently attended the ‘Online Development in Wales’ day organised by ARCW (Archives and Record Council Wales) to talk about the Porth Archifau (Archives Hub). I found out a good deal about what is happening in Wales at the moment and heard about plans and wishes for future developments.

In her introduction, Charlotte Hodgson from ARCW talked about the need for online catalogues with images rather than the other way around. Maybe there is too much emphasis on digitisation of images which become separated from their context. She referred to the good work of Archives Network Wales (ANW), but acknowledged that Wales is in danger of falling behind with online catalogues. There is a need to maximise opportunities, minimise duplication and effectively deploy resources.

Kim Collis from ARCW gave some background on ANW (now Archives Wales), which is a searchable database for collection-level descriptions that uses a MySQL database and a Typo3 front-end. It has stayed relatively static since it was first developed; the emphasis of individual offices maybe moved to their own web presence (many were using CALM and there was something of a race to get their catalogues online).  The front-end of the ANW site has not necessarily always been very user-friendly and has not provided the depth of information that it might do. However, it was developed in a standards-based way, and this stands it in good stead for future development. ‘Archives Wales’ was a bolt-on to the database, giving more information and including additional information about repositories, making a more complete and visually appealling site.

There has been some geo-tagging within ANW recently. This was seen as a good way to link in with People’s Collection Wales, enabling users to find out more information about, for example, a family that has owned an estate.  Kim talked about a number of possible developments, such as a project to provide links to  searchable tithe apportionments transcripts. The idea is to allow volunteers to transcribe the images.

Kim talked about the need to improve branding and identity. The site must be kept up to date to give it credibility. But there is, in a sense, competition with repository websites because many repositories want to prioritise these. I think it is worth impressing upon archivists the importance of cross-searching capability that aggregators provide, as well as the value of searching within a repository. We should not presuppose that researchers primarily want to know what is at just one individual office; they usually want to find ‘stuff’ on their topic of interest and then go down to the more detailed level of individual sources of information.

Sam Velumyl from The National Archives talked about the Discovery initiative at TNA, which provides a new information architecture that will accommodate the different systems that TNA has.   The idea is that it can accommodate the integration of other systems easily, making it a more sustainable and flexible solution. They are going to be carrying out an exercise in gathering feedback on Discovery, and you’re likely to hear about that very soon.  Sam said that the feedback will help TNA to decide upon their priorities. It may be that A2A will become active again, but at present this has not been decided.  There were concerns in the room that it is very difficult to get TNA to provide data back out of A2A.

People’s Collection Wales, which was presented to us by three speakers, is very much geared towards user-friendly and fun engagement in the history and culture of Wales. It works on the basis of everything being an item, and it gathers items together in collections by topic, not in the way that archivists would normally understand collections, but simply by areas that will be of interest to users. It is quite an eclectic experience, designed to draw in a broad section of the community and promote learning and understanding of Welsh history.  Re-purposing is a strong principle behind PCW. It integrates social media to encourage the idea of sharing the photograph or interview or whatever on Facebook or Twitter. It also has a scrapbook function so that people can gather together their own collections. It does link to the item within context, so you can link back to the website of the depositor.

PCW are going to be using an API to upload collection records  from Archives Wales. I got a little confused about this, as they also spoke about manual upload. I think the automated upload will only be for certain records.  They are also doing some interesting work with GIS, to enable users to do things like look at maps over time to see how a place has developed, and looking at making museum objects viewable in a 3-D way.

My plea to PCW is to make their titles clickable links where it seems as if they should be clickable. I found the site fun, with some great stuff, but it can take a while to understand what you are looking at. I went to browse the collections and many of them are untitled, and it’s not really clear what they are representing. I tried the map interface and looked for ‘castle’ near ‘barmouth’ and I was taken to a page of images of people talking about the Eisteddfod. The second time it worked better, but some of the images were not actually images and one of them remained in place when I did another search and I couldn’t delete it from the display, and I had a few more experiences of searches hanging and the display freezing. But then other searches worked well and I started getting links from places to objects. So, it was a mixed bag for me, and it seemed quite beta in terms of functionality, and also it was very slow, and I do think that’s a problem.  It feels very experimental, with loads of good ideas, but I wonder if it would be better to concentrate on developing fewer ideas but making them more effective.

The afternoon was more focussed on solutions for getting archives online. CyMAL recently commissioned research to analyse requirements for extending online access to archive catalogues in Wales, building on ARCW, and Sarah Horton gave us a summary of some of the findings.  Some of the stats were quite interesting: 11 local authority services use CALM, 1 uses the Archivists’ Toolkit and 1 uses Word. In higher education: 3 CALM, 1 Word, 1 no formal catalogue. The National Library of Wales uses the virutal library system and AC-NMW uses AdLib.  The survey found that the application of authority files and data standards was variable.

For online Access: 3 via CALMView but there are barriers to this for many offices, one being IT and their concerns about security. 4 services provide access via their own systems, 2 via PDF documents.  About 8,000 collections are listed on Archives Wales and 2,000 on the Hub.

9 services have backlogs of between 10-30%, 6 of over 30% and more if poor quality catalogues are taken into account. Many catalogues remain in manual form only.

We had a very interesting talk on the Black Country History website. Linda Ellis talked about how important it was for the project to be sustainable right from the outset.  The project was about working together to reduce costs and create a sustainable online resource. The original website used the Axiell DSCovery software, but it was not fit for purpose.  The redevelopment was by Orangeleaf System using their CollectionsBase system and WordPress, which means it is very easy to create different front-ends. There are a number of microsites, such as one for geology, filtered by keyword, a great idea for a way to target different audiences with minimal additional effort. Partners can upload data when they like via an XML export from CALM.  CollectionsBase will also take Excel, Access and manual data entry.   There is an API, so the data goes on to Culture Grid and Europeana.

Altogether a very stimulating day, with a good vibe and plenty of discussion.