HubbuB: November 2011

November 4, 2011 / Jane Stevenson

I don’t think we made much of a fuss about reaching 200 contributors, but we’re really pleased to say that we’re now into the 200’s and new contributors are coming on board regularly, which makes the Hub even more useful to even more researchers.

We’re currently trying out a bit of a whizzy thing with the contributors’ map – go to http://archiveshub.ac.uk/contributorsmap/ and try a few clicks and you’ll see what I mean. We particularly like the jump from Aberdeen to Exeter, and are looking for archives from further afield in order to execute even bigger jumps!

Speaking of contributors, we’ve made a few changes to our contributor pages. We now have a link to browse each contributor’s descriptions, and also a link to simply show the list of collections. This link was largely introduced to help us with our quest to bring the Hub out loud and strong through Google. We’re doing pretty well on that front….we’ve found that page views have gone up radically over the last few months, and that can only be good for archives. I think the list of descriptions can really look quite impressive – I tried Aberdeen and found collections from ‘favourite tunes’ to ‘a valuation of the Shire of Aberdeen’.

We’ve been busy on our new Linking Lives project, using Linked Data to create a Web front-end, and making the data available via an open licence. We’re really pleased that the vast majority of contributors have not asked us to exclude their descriptions, and many have emailed specifically to endorse what we are doing. This is brilliant news, and I think it shows that most archivists are actually forward-thinking and understand that technology can really benefit our domain (flattery will get you everywhere!). We want to ensure that archives are out there in the Web of Data, and part of the innovative work that is happening now. You may have seen a few blog posts to get going on Linking Lives: http://archiveshub.ac.uk/linkinglives/. Pete’s are rather more technical than mine, and brilliantly set out some of the difficult issues. I’m trying to think about what archivists are interested in and how we think about archival context. I hope our posts on licensing convey how much we are thinking about the best way to present and attribute the content.

Lastly for this month’s HubbuB, I’ve knocked up a fairly short Feature on the latest stuff that’s happening. I’m thinking of this as an annual feature – sometimes we are so busy we kind of forget to actually make a bit of noise about what we’ve achieved. You’ll see that we’re working on some record display improvements. I really hope I can show you these soon.

Blowing the dust off Special Collections

October 21, 2011 / Jane Stevenson

Guest Blog Post by John Hodgson

Mimas works on exciting and innovative projects all the time and we wanted Hub blog readers to find out more about the SCARLET project, where Mimas staff, academics from the University of Manchester and the archive team at John Rylands University Library are exploring how Augmented Reality can bring resources held in special collections to life by surrounding original materials with digital online content.

The Project

Special Collections using Augmented Reality to Enhance Learning and Teaching (SCARLET)

SCARLET addresses one of the principal obstacles to the use of Special Collections in teaching and learning – the fact that students must consult rare books, manuscripts and archives within the controlled conditions of library study rooms. The material is isolated from the secondary, supporting materials and the growing mass of related digital assets. This is an alien experience for students familiar with an information-rich, connected wireless world, and is a barrier to their use of Special Collections.

The SCARLET project will provide a model that other Special Collections libraries can follow, making these resources accessible for research, teaching and learning. If you are interested in creating similar ‘apps’ and using the toolkit created by the team then please get in touch.

SCARLET Blog: http://teamscarlet.wordpress.com/

SCARLET Twitter: twitter.com/team_scarlet

The Blog Post

Blowing the dust off Special Collections

The academic year is now in full swing and JRUL Special Collections staff are busy delivering ‘close-up’ sessions and seminars for undergraduate and postgraduate students.

A close-up session typically involves a curator and an academic selecting up to a dozen items to show to a group of students. The items are generally set out on tables and everyone gathers round for a discussion. It is a real thrill for students to see Special Collections materials up close, and in some circumstances to handle the items themselves. The material might be papyri from Greco-Roman Egypt, medieval manuscripts, early printed books, eighteenth-century diaries and letters, or modern literary archives: the range of our Special Collections is vast.

Dr Guyda Armstrong shows her students a selection of early printed editions of Dante.

From our point of view, it’s really rewarding and enlightening to work alongside enthusiastic teachers such as Guyda Armstrong, Roberta Mazza and Jerome de Groot. The ideal scenario is a close partnership between the academic and the curator. Curators know the collections well, and we can discuss with students the materiality of texts, technical aspects of books and manuscripts, the context in which texts and images were originally produced, and the afterlife of objects – the often circuitous routes by which they have ended up in the Rylands Library. Academics bring to the table their incredible subject knowledge and their pedagogical expertise. Sparks can fly, especially when students challenge what they are being told!

This week I have been involved in close-up sessions for Roberta Mazza’s ‘Egypt in the Graeco-Roman World’ third-year Classics course, and Guyda Armstrong’s ‘Beyond the Text’ course on Dante, again for third-year undergraduates. Both sessions were really enjoyable, because the students engaged deeply with the material and asked lots of questions. But the sessions also reinforced my belief that Augmented Reality will allow us to do so much more. AR will make the sessions more interactive, moving towards an enquiry-based learning model, where we set students real questions to solve, through a combination of close study of the original material, and downloading metadata, images and secondary reading, to help them interrogate and interpret the material. Already Dr Guyda Armstrong’s students have had a sneak preview of the Dante app, and I’m look forward to taking part in the first trials of the app in a real teaching session at Deansgate in a few weeks’ time.

For many years Special Collections have been seen by some as fusty and dusty. AR allows us to bring them into the age of app.

HubbuB: October 2011

October 7, 2011 / Jane Stevenson

Europeana and APENet

I have just come back from the Europeana Tech conference, a 2 day event on various aspects of Europeana’s work and on related topics to do with data. The big theme was ‘open, open, open’, as well, of course, as the benefits of a European portal for cultural heritage. I was interested to hear about Europeana’s Linked Data output, but my understanding is that at present, we cannot effectively link to their data, because they don’t provide URIs for concepts. In other words, identifiers for names such as http://data.archiveshub.ac.uk/doc/agent/gb97/georgebernardshaw, so that we can say, for example, that our ‘George Bernard Shaw’ is the same as ‘George Bernard Shaw’ represented on Europeana.

I am starting to think about the Hub being part of APENet and Europeana. APENet is the archival aggregator for Europe. I have been in touch with them about the possibility of contributing our data, and if the Hub was to contribute, we could probably start from next year. Europeana only provide metadata for digital content, so we could only supply descriptions where the user can link to the digital content, but this may well be worth doing, as a means to promote the collections of any Hub contributors who do link to digital materials.

If you are a contributor, or potential contributor, we would like to know what you think…. we have a quick question for you at http://polldaddy.com/poll/5565396/. It simply asks if you think its a good idea to be part of these European initiatives. We’d love to get your views, and you only have to leave your name and a comment if you want to.

Flickr: an easy way to provide images online

You will be aware that contributors can now add images to descriptions and links to digital content of all kinds. The idea is that the digital content then forms an integral whole with the metadata, and it is also interoperable with other systems.

I’ve just seen an announcement by the University of Northampton, who have recently added materials to Flickr . I know that many contributors struggle to get server space to put their digital content online, so this is one possible option, and of course it does reach a huge number of people this way. There may be risks associated with the persistence of the URIs for the images, but then that is the case wherever you put them.

On the Hub we now have a number of images and links to content, for example: http://archiveshub.ac.uk/data/gb1089ukc-joh, http://archiveshub.ac.uk/data/gb1089ukc-bigwood, http://archiveshub.ac.uk/data/gb1089ukc-wea, http://archiveshub.ac.uk/data/gb141boda?page=7#boda.03.03.02.

Ideally, contributors would supply digital content at item level, so the metadata is directly about the image/digital content, but it is fine to provide it at any level that is appropriate. The EAD Editor makes adding links easy (http://archiveshub.ac.uk/dao/). If you aren’t sure what to do, please do email us.

Preferred Citation

We never had the field for the preferred citation in our old template for the creation of EAD, and it has not been in the EAD Editor up till now. We were prompted to think about this after seeing the results of a survey on the use of EAD fields presented at the Society of American Archivists conference. Around 80% of archive institutions do use it. We think it’s important to advise people how to cite the archive, so we are planning to provide this in the Editor and may be able to carry out global edits to add this to contributors’ data.

List of Contributors

Our list of contributors within the main search page has now been revised, and we hope it looks substantially more sensible, and that it is better for researchers. This process really reminded us how hard it is to come up with one order for institutions that works for everyone! We are currently working on a regional search, something that will act as an alternative way to limit searching. We hope to introduce this next year.

And finally…A very engaging Linked Data interface

This interface demonstration by Tim Sherratt shows how something driven by Linked Data can really be very effective. It also uses some of the Archives Hub vocabulary from our own Linked Data work, which is a nice indication of how people have taken notice of what we have been doing. There is a great blog post about it by Pete Johnston, Storytelling, archives and Linked Data. I agree with Pete that this sort of work is so exciting, and really shows the potential of the Linked Data Web for enabling individual and collective storytelling…something we, as archivists, really must be a part of.

The Quest for Single Search

September 26, 2011 / Jane Stevenson

This post is based on a report published by OCLC Research, Single Search: The Quest for the Holy Grail (Leah Prescott and Ricky Erway, 2011).

It is less than ideal when users can benefit from a single search option for resources across the internet, but within an institution they are presented with a range of search systems for different services and resources. A single search obviously allows researchers to search across the organisation’s resources; it may also give a sense of the rich resources of an organisation and may provide a motivation to build upon them.

The OCLC report is based upon discussions with nine organisations that have implemented single search. There are certainly substantial challenges, not least the resources required and the need for effective collaboration across an institution. But it is clear that single search, if it is provided effectively, will help researchers and will help to harmonize collections management.

Single search needs to simplify rather than complicate the user experience, and sometimes the challenges this poses are not addressed and a single search ends up being a frustrating or confusing experience. We know that some users find navigating archival hierarchical descriptions confusing; adding library and museum items to this increases the challenge. Different collections may be catalogued very differently and to different levels of granularity, so presenting a coherent list of results is not easy. Added to this, many institutions now have digital collections, but only a part of their resources are digitised and so there is a need to indicate clearly what is digital (what can be accessed digitally) and what requires a visit to the institution.

The OCLC report refers to single search having the ability to ‘fundamentally change how an institution identifies itself’. Maybe if the single search represents a large part of the resources of an institution this is true; it is not likely to be the case in a university, where the collections are only a small part of the university’s business. Single search may enable curators, archivists and librarians themselves to get a more coherent view of the collections. This could be a useful advantage, as we know that often curators in charge of one collection or subject area do not necessarily have a good understanding of the whole. It may encourage a more efficient and streamlined approach to collections management.

Amongst the nine institutions that formed part of the OCLC discussions, some did have a mandate to create single search, but even with this kind of directive, there is a need for senior managers to provide the resources required and ensure that it is made a priority. In addition, the isssue of individual motivation is significant. I think this is a fascinating area that is sometimes overlooked: The extent to which the staff involved are motivated to work together and to achieve a vision must have a substantial impact on the outcome. What sort of role to ‘champions’ play? How important are they? Does it come down to individuals with intellectual curiosity and the willingness to learning new skills and change working habits? Is it important for the institution to foster this kind of attitude in order to ensure that innovations like single search are likely to work? One of the institutions in the OCLC report referred to the staff that had been selected to work on a single search as being selected for their ‘interest, skills and capacity to work on the program’. I have certainly come across colleagues who are frustrated by a lack of co-operation from other staff, which can significantly hamper any kind of innovative changes to metadata creation and cross-searching.

I think that attitudes are key to success in a project like this, where working practices may have to change and habits may need to be broken. It reminds me of that great YouTube video of the lone dancer who is joined by just one person – one is a crazy lone dancer, and others tend to try to ignore him/her; but once just one person joins in you have a group, and once you have two, then you’re more likely to get three, then four, and then the group builds up to the extent where those who are reluctant to join in anything a bit new or different, where they might embarrass themselves, end up joining in because not joining in becomes the exception rather than the rule. It’s a slightly different scenario but the point is similar.

The size of the institution is likely to have an impact. A small institution is often more agile, and getting buy-in may be easier, although there may be less resource to draw on. Maybe for a large organisation, trying to implement something that cuts through the departments and teams in a very horizontal way, like single search, is harder if the organisational structures remain the same. The priorities of the different departments involved may end up pulling against the project. It becomes all the more important to define the goals, get buy-in at the right levels, have clear and effective communication channels, and also find an effective way to keep the momentum and motivation going.

The OCLC report makes one observation which resonates very much with me: ‘It is important for the success of the project to have representation…from IT units, as weak motivation within the IT area of an organization has the power to paralyze such a project.’ The important thing here seems to be to ensure that the right people are included at the right stages in the project. IT should be brought in right at the outset and a real effort should be made to develop not only a common understanding but also a feeling of good will and strong motivation.

As the OCLC report states: ‘The reality of achieving an integrated access vision could mean overturning years or decades of institutional thinking, which has segmented collections management practice among the three different sectors of LAMs.’ Professionals within libraries, archives and museums have their own perspectives and values, and are often very caught up in their own long-standing practices. There may be good reason for this – often curators and archivists have had to fight over time to ensure their collections are properly looked after and catalogued. But a single search may call for a more compromised approach, and certainly it is likely to call for different thinking and finding new ways to represent the collections.

The ‘Technological Considerations’ section of the report is well worth reading, giving a short summary of some of the options. This is an area where the Archives Hub is very well aware of the pros and cons of different approaches. For an institution wanting to implement single search, there are a number of approaches: systems where you adopt batch export; systems where an API is used to pull the data in dynamically; a single system that replaces all the separate systems or multiple systems harvesting to a central repository; a federated search where each separate system is queried and results are brought back and presented to the user; a central index that is searched rather than the individual systems. All of these have pros and cons around things like flexibility, speed, currency and professional practices.

Of course, a further very important consideration will be digital assets, and the need to take a systematic approach here. Institutions may have Digital Asset Management Systems, but do these operate effectively with other collections sytems? Do digital assets exist in the different collections management systems? Are there shared metadata standards for digital assets?

Metadata Considerations present a whole new raft of challenges. I think that all to often those outside of the domains – maybe the managers who want to see single search and a more integrated approach – do not appreciate the substantial differences in approach between libraries, archives and museums. It is thought that because they all have something to do with that nebulous concept of ‘cultural heritage’ that they should all play together relatively easily. But each domain has built up its own world-view over many decades; the development of standards and best practice involves a great deal of hard work. It could be argued that finding ways to present catalogues or finding aids to users in a way that is as simple and straightforward as possible is not compatible with single search. It may be that single search, while seeking to provide an integrated approach, actually creates a more complex interface as a result of trying to integrate collection-based and hierarchical archival descriptions, item-based museum artifact descriptions and largely open access and usually non-unique library collections.

One of the biggest problems is that metadata is expensive to create. Automated metadata provides one solution but it is a very partial solution, especially for unique archival and musuem collections. Another challenge is that usually metadata has been created over long periods of time using a variety of systems, sometimes migrating from one system to another (often with patchy results). Metadata is messy, and yet standards lie at the heart of effective integration. But even standards are usually at different stages of evolution, and standards adopted by each of the domains do not necessarily harmonise very well.

One of the issues we have noticed on the Hub is the tendency for collections that are catalogued in great detail can overwhelm more summary descriptions. It can give the effect that those catalogued in more detail are more important. If you search on the Archives Hub relatively frequenly, you are likely to come across ‘University of Liverpool Staff Papers’ because they have been very thoroughly catalogued. There may be really good stuff in there, but should this one collection seem to be so much more important than so many others? Yet detailed cataloguing is surely a good thing?

There are also issues around vocabularies, and the tendency to implement multiple vocabularies within the same community. The Hub allows for any recognised vocabulary to be used for index temrsm but that does mean personal names, for example, entered using NCA Rules or AACR. You will inevitably end up with several different entries for the same name. The OCLC report refers to the need to harmonise metadata, trying to standardise terms, but I think that for us the way forward is generally to try to use the ever increasing sophistication of data processing tools to get round this problem, becuase we will never get 200 institutions to put things into the system in exactly the same way. Having said that, we are finding that as most of our contributors use our EAD Editor now, the descriptions are much more consistent and easier to integrate.

The OCLC report ends with some advice on the user interface, and one comment that I wholeheartedly agree with is the advice to hire or consult professional designers if you possibly can. Web presence is so important and Websites are often quite poorly designed. An ideal is to carry out user testing, but having done this ourselves, we know just how much time and effort it can take, and for many archives this is really quite a barrier. Even just testing with a small handful of users is very worthwhile. It’s amazing how much you find you have taken for granted that researchers will question. It’s good to see the importance of rights management emphasised and the need to clearly define access to content. This is becoming increasingly relevant, as data is republished, shared and recombined.

Single search is an important goal, not least because ‘the challenges inherent in this information divide ultimately expect researchers to compartmentalize their interests in a similar manner, rather than encouraging more multi-disciplinary approaches that focus on the research inqury (rather than the nature and custody of the resources).’ Our appoach has tended to suit our own professional outlooks; it should be geared towards what researchers want and need.

All quotes taken from Prescott, Leah and Ricky Erway. 2011. Single Search: The Quest for the Holy Grail. Dublin, Ohio: OCLC Research.

Image from www.digital-delight.ch

HubbuB: September 2011

September 9, 2011 / Jane Stevenson

APEnet & Europeana

You may be aware of the Archives Portal Europe – http://www.archivesportaleurope.eu. We’ve been considering whether the Hub should be part of this and I would welcome any thoughts that you have about it, as it would be your archives that would be represented. I don’t think the Website offers the best navigation or user interface at the moment, and the coverage is very very patchy. But should we be supporting the principle of a European-wide archives portal, and looking to be part of it? I know they are planning on a great deal more development work, and they are interested in the Hub joining in 2012. We are generally keen here at the Hub to do all we can to promote your collections, and enable connections to be made with other materials, and whilst very ambitious, projects like APENet take this idea to a whole new level.

Similarly, we are looking at what Europeana are doing, and I will be attending the Europeana Tech conference in October (http://www.europeanaconnect.eu/europeanatech/) – a blog post will follow with some reflections on the conference and on the significance of Europeana. At present, our main aim is to stay abreast of what is happening and look at the sort of commitment being a part of it would involve.

New contributors

The more contributors the Hub has, the more valuable it becomes as a cross-searching tool for researchers, helping them to discovery the great archives that are out there. Our latest contributors are Cambridge University: Sedgwick Museum of Earth Sciences, St Pauls Cathedral, Oxford Brookes Special Collections, Victoria & Albert Museum Theatre & Performance, Islington Local History Centre, Glasgow Women’s Library, Royal Scottish Academy of Music and Drama. We are very close now to our 200th contributor!

SNAC project for name authorities

The Social Networks in Archival Context project has been very successfully taking EAD descriptions and creating EAC-CPF authority files, working to disambiguate and pattern-match in order to create a set of name authorities that we can all use and benefit from. I recommend taking a look at their website: http://socialarchive.iath.virginia.edu/ and in particular the demonstrator: http://socialarchive.iath.virginia.edu/xtf/search. Search or click on a record and try the new RGraph demonstrator to see a prototype visualisation – it shows the sorts of new ways of looking at data that we have the opportunity to create.

The project have agreed in principle to take Hub description, and create authority records. I’d love to hear your thoughts on this. As yet, of course, the Hub does not display authority records, but this is something we need to work on. We will also be looking at how this fits into our new Linking Lives project, part of our Locah work (http://archiveshub.ac.uk/blog/?p=2699). I’ll try to knock up a blog post that outlines what the SNAC project is doing and how we might fit into it.

Hub Feature

This month we’re pleased to say that we have a feature about the Mary Hamilton Papers, held at John Rylands Library, The University of Manchester: “Courtier, diarist and bluestocking, her papers offer a veritable cornucopia of information on royal, aristocratic, artistic and literary circles during the late 18th and early 19th centuries.” http://archiveshub.ac.uk/features/maryhamilton/index.html

HubbuB is a monthly newsletter aimed primarily at Archives Hub contributors and archives professionals.

Locah Linking Lives: an introduction

August 31, 2011 / Jane Stevenson

We are very pleased to announce that the Archives Hub will be working on a new Linked Data project over the next 11 months, following on from our first phase of Locah, and called Linking Lives. We’d like to introduce this with a brief overview of the benefits of the Linked Data approach, and an outline of what the project is looking to achieve. For more in-depth discussion of Linked Data, please see our Locah project blog, or take a look at the Linked Data website guides and tutorials.

Linked Open Data Cloud — Linked Data Cloud

The benefits of Linked Data

The W3C currently has a draft of a report, ‘Library Linked Data‘, which covers archives and museums. In this they state that:

‘Linked data is shareable, extensible, and easily re-usable… These characteristics are inherent in the linked data standards and are supported by the use of web-friendly identifiers for data and concepts.’

Shareable

One of the exciting things about Linked Data is that it is about sharing data (certainly where you have Linked Open Data). I have found that this emphasis on sharing and data integration has actually had a positive effect aside from the practical reality of sharing; it engenders a mindset of collaboration and sharing, something that is of great value not just in the pursuit of the Linked Data vision, but also, more broadly, for any kind of collaborative effort and for encouraging a supportive environment. Our previous Linked Data project, Locah, has been great for forging contacts and for putting archival data within this exciting space where people are talking about the future of the Web and the sorts of things that we might be able to do if we work together.

For the Archives Hub, our aim is to share the descriptions of archive collections as a means to raise the profile of archives, and show just how relevant archives are across numerous disciplines and for numerous purposes. In many ways, sharing the data gives us an opportunity to get away from the idea that archives are only of interest to a narrow group of people (i.e. family historians and academics purely within the History Faculty).

Extensible

The principle of allowing for future growth and development seems to me to be vital. The idea is to ensure that we can take a flexible approach, whereby enhancements can be made over time. This is vital for an exploratory area like Linked Data, where an iterative approach to development is the best way to go, and where we are looking at presenting data in new ways, looking to respond to user needs, and working with what technology offers.

Reusable

‘Reuse’ has become a real buzz word, and is seen as synonymous with efficiency and flexibility. In this context it is about using data in different contexts, for different purposes. In a Linked Data environment what this can mean is providing the means for people to combine data from different sources to create something new, something that answers a certain need. To many archivists this will be fine, but some may question the implications in terms of whether the provenance of the data is lost and what this might mean. What about if information from an archive description is combined with information from Wikipedia? Does this have any implications for the idea of archive repositories being trusted and does it mean that pieces of the information will be out of their original context and therefore in some way open to misuse or misinterpretation?

Reuse may throw up issues, but it provides a great deal more benefits than risks. Whatever the caveats, it is an inevitable consequence of open data combined with technology, so archives either join in or exclude themselves from this type of free-flow of data. Nevertheless, it is certainly worth thinking about the issues involved in providing data within different contexts, and our project will consider issues around provenance.

Linking Lives

The basic idea of the Linking Lives project is to develop a new Web interface that presents useful resources relating to individual people, and potentially organisations as well.

It is about more than just looking at a name-based approach for archives. It is also about utilising external datasets in order to bring archives together with other data sources. Researchers are often interested in individual people or organisations, and will want to know what sort of resources are out there. They may not just be interested in archives. Indeed, they may not really have thought about using archives, but they may be very interested in biographical information, known connections, events during a person’s lifetime, etc. The idea is to show that various sources exist, including archives, and thus to work towards broadening the user-base of archives.

Our interface will bring different data sources together and we will link to all the archival collections relating to an individual. We have many ideas about what we can do, but with limited time and resources, we will have to prioritise, test out various options and see what works and what doesn’t and what each option requires to implement. We’ll be updating you via the blog, and we are very interested in any thoughts that you have about the work, so please do leave comments, or contact us directly.

In some ways, our approach may be an alternative to using EAC-CPF (Encoded Archvial Content for Corporate Bodies, Persons and Families, an XML standard for marking up names associated with archive collections). But maybe in essence it will compliment EAC-CPF, because eventually we could use EAC authority records and create Linked Data from them. We have followed the SNAC project with interest, and we recently met up with some of the SNAC project members at the Society of American Archivists’ Conference. We hope to take advantage of some of the exciting work that they are doing to match and normalise name records.

The W3C draft report on Library Linked Data states that ‘through rich linkages with complementary data from trusted sources, libraries [and archives] can increase the value of their own data beyond the sum of its sources taken individually’. This is one of the main principles that we would like to explore. By providing an interface that is designed for researchers, we will be able to test out the benefits of the Linked Data approach in a much more realistic way.

Maybe we are at a bit of a crossroads with Linked Data. A large number of data sets have been put out as XML RDF, and some great work has been done by people like the BBC (e.g. the Wildlife Finder), the University of Southampton and the various JISC-funded projects. We have data.gov.uk, making Government data sets much more open. But there is still a need to make a convincing argument that Linked Data really will provide concrete benefits to the end user. Talking about Sparql endpoints, JSON, Turtle, Triples that connect entities and the benefits of persistent URIs won’t convince people who are not really interested in process and principles, but just want to see the benefits for themselves.

Has there been too much emphasis on the idea that if we output Linked Data then other people can (will) build tools? The much quoted adage is ‘The best thing that will be done with your data will be done by someone else’, but is there a risk in relying on this idea? In order to get buy-in to Linked Data, we need visible, concrete examples of benefit. Yes, people can build tools, they can combine data for their own purposes, and that’s great if and when it happens; but for a community like the archives community the problem may be that this won’t happen very rapidly, or on the sort of scale that we need to prove the worth of the investment in Linked Data. Maybe we are going to have to come up with some exemplars that show the sorts of benefits that end users can get. We hope that Linking Lives will be one step along this road; not so much an exemplar as a practical addition to the Archives Hub service that researchers will immediately be able to benefit from.

Of course, there has already been very good work within the library, archive and museum communities, and readers of this blog may be interested in the Use Cases that are provided at http://www.w3.org/2005/Incubator/lld/wiki/UseCaseReport

Just to finish with a quote from the W3C draft report (I’ve taken the liberty of adding ‘archives’ as I think they can readily be substituted):

“Linked Data reaches a diverse community far broader than the library/archive community; moving to library/archival Linked Data requires libraries/archives to understand and interact with the entire information community. Much of this information community has been engendered by the capabilities provided by new technologies. The library/archive community has not fully engaged with these new information communities, yet the success of Linked Data will require libraries/archives to interact with them as fully as they interact with other libraries/archives today. This will be a huge cultural change that must be addressed.”

Flickr: Icono SVDs photostream, http://www.flickr.com/photos/28860201@N05/with/3674610629/

Archives Wales

August 18, 2011 / Jane Stevenson

I recently attended the ‘Online Development in Wales’ day organised by ARCW (Archives and Record Council Wales) to talk about the Porth Archifau (Archives Hub). I found out a good deal about what is happening in Wales at the moment and heard about plans and wishes for future developments.

In her introduction, Charlotte Hodgson from ARCW talked about the need for online catalogues with images rather than the other way around. Maybe there is too much emphasis on digitisation of images which become separated from their context. She referred to the good work of Archives Network Wales (ANW), but acknowledged that Wales is in danger of falling behind with online catalogues. There is a need to maximise opportunities, minimise duplication and effectively deploy resources.

Kim Collis from ARCW gave some background on ANW (now Archives Wales), which is a searchable database for collection-level descriptions that uses a MySQL database and a Typo3 front-end. It has stayed relatively static since it was first developed; the emphasis of individual offices maybe moved to their own web presence (many were using CALM and there was something of a race to get their catalogues online). The front-end of the ANW site has not necessarily always been very user-friendly and has not provided the depth of information that it might do. However, it was developed in a standards-based way, and this stands it in good stead for future development. ‘Archives Wales’ was a bolt-on to the database, giving more information and including additional information about repositories, making a more complete and visually appealling site.

There has been some geo-tagging within ANW recently. This was seen as a good way to link in with People’s Collection Wales, enabling users to find out more information about, for example, a family that has owned an estate. Kim talked about a number of possible developments, such as a project to provide links to searchable tithe apportionments transcripts. The idea is to allow volunteers to transcribe the images.

Kim talked about the need to improve branding and identity. The site must be kept up to date to give it credibility. But there is, in a sense, competition with repository websites because many repositories want to prioritise these. I think it is worth impressing upon archivists the importance of cross-searching capability that aggregators provide, as well as the value of searching within a repository. We should not presuppose that researchers primarily want to know what is at just one individual office; they usually want to find ‘stuff’ on their topic of interest and then go down to the more detailed level of individual sources of information.

Sam Velumyl from The National Archives talked about the Discovery initiative at TNA, which provides a new information architecture that will accommodate the different systems that TNA has. The idea is that it can accommodate the integration of other systems easily, making it a more sustainable and flexible solution. They are going to be carrying out an exercise in gathering feedback on Discovery, and you’re likely to hear about that very soon. Sam said that the feedback will help TNA to decide upon their priorities. It may be that A2A will become active again, but at present this has not been decided. There were concerns in the room that it is very difficult to get TNA to provide data back out of A2A.

People’s Collection Wales, which was presented to us by three speakers, is very much geared towards user-friendly and fun engagement in the history and culture of Wales. It works on the basis of everything being an item, and it gathers items together in collections by topic, not in the way that archivists would normally understand collections, but simply by areas that will be of interest to users. It is quite an eclectic experience, designed to draw in a broad section of the community and promote learning and understanding of Welsh history. Re-purposing is a strong principle behind PCW. It integrates social media to encourage the idea of sharing the photograph or interview or whatever on Facebook or Twitter. It also has a scrapbook function so that people can gather together their own collections. It does link to the item within context, so you can link back to the website of the depositor.

PCW are going to be using an API to upload collection records from Archives Wales. I got a little confused about this, as they also spoke about manual upload. I think the automated upload will only be for certain records. They are also doing some interesting work with GIS, to enable users to do things like look at maps over time to see how a place has developed, and looking at making museum objects viewable in a 3-D way.

My plea to PCW is to make their titles clickable links where it seems as if they should be clickable. I found the site fun, with some great stuff, but it can take a while to understand what you are looking at. I went to browse the collections and many of them are untitled, and it’s not really clear what they are representing. I tried the map interface and looked for ‘castle’ near ‘barmouth’ and I was taken to a page of images of people talking about the Eisteddfod. The second time it worked better, but some of the images were not actually images and one of them remained in place when I did another search and I couldn’t delete it from the display, and I had a few more experiences of searches hanging and the display freezing. But then other searches worked well and I started getting links from places to objects. So, it was a mixed bag for me, and it seemed quite beta in terms of functionality, and also it was very slow, and I do think that’s a problem. It feels very experimental, with loads of good ideas, but I wonder if it would be better to concentrate on developing fewer ideas but making them more effective.

The afternoon was more focussed on solutions for getting archives online. CyMAL recently commissioned research to analyse requirements for extending online access to archive catalogues in Wales, building on ARCW, and Sarah Horton gave us a summary of some of the findings. Some of the stats were quite interesting: 11 local authority services use CALM, 1 uses the Archivists’ Toolkit and 1 uses Word. In higher education: 3 CALM, 1 Word, 1 no formal catalogue. The National Library of Wales uses the virutal library system and AC-NMW uses AdLib. The survey found that the application of authority files and data standards was variable.

For online Access: 3 via CALMView but there are barriers to this for many offices, one being IT and their concerns about security. 4 services provide access via their own systems, 2 via PDF documents. About 8,000 collections are listed on Archives Wales and 2,000 on the Hub.

9 services have backlogs of between 10-30%, 6 of over 30% and more if poor quality catalogues are taken into account. Many catalogues remain in manual form only.

We had a very interesting talk on the Black Country History website. Linda Ellis talked about how important it was for the project to be sustainable right from the outset. The project was about working together to reduce costs and create a sustainable online resource. The original website used the Axiell DSCovery software, but it was not fit for purpose. The redevelopment was by Orangeleaf System using their CollectionsBase system and WordPress, which means it is very easy to create different front-ends. There are a number of microsites, such as one for geology, filtered by keyword, a great idea for a way to target different audiences with minimal additional effort. Partners can upload data when they like via an XML export from CALM. CollectionsBase will also take Excel, Access and manual data entry. There is an API, so the data goes on to Culture Grid and Europeana.

Altogether a very stimulating day, with a good vibe and plenty of discussion.

HubbuB: August 2011

August 4, 2011 / Jane Stevenson

We are out and About in August. Jane and Joy will be going to the Society of American Archivists’ Conference this year, speaking as part of a panel session. We will be talking about Discovery, the Archives Hub and Linked Data. We’re also very excited to be visiting the OCLC offices in Dublin Ohio. Lisa and Bethan will be at the Archives and Records Association conference in Edinburgh, so go and say hello if you are there. Lisa is also speaking at the conference.

Our Monthly Feature is all levitating women and mustacheod men, as we take a trip into Magic and Illusion at the Fairground Archive: http://archiveshub.ac.uk/features/magic/. Some great images, and a lovely photograph of Cyril Critchlow, a wizard in his 80’s, performing as ‘Wizardo, Harry Potter’s grandfather’!

We’ve recently created a page of Top Tips for Cataloguing: http://archiveshub.ac.uk/cataloguingtips/. These are some of the key areas that we believe are important for good online catalogues. We do still find that archivists don’t always think about the global online environment, so it’s worth setting out some of the most important points to bear in mind. It’s partly about thinking of the audience, browsing the Web, using Google, scanning pages for relevant content, and it’s partly about descriptions – ensuring that the title is as clear and self-explanatory as possible, thinking about how best to describe the archive in a way that is user-friendly.

We’ve been talking about ways to help get descriptions onto the Hub when they are created in Microsoft Word or Excel. We’re just exploring possibilities at the moment, but we are interested in anyone who uses, or knows anyone who uses, Microsoft Word to catalogue. Maybe smaller offices, or maybe you ask volunteers to do some of this?

We know people do use Microsoft Excel as well. We are thinking about ‘Tips for using Excel’. Would this be useful? We don’t necessarily want to give the impression that Excel is the most appropriate choice for cataloguing – its a spreadsheet software, not really for complex hierarchical archives. But we do realise that for some people, the choice of what to use is limited, and we want to do our best to accommodate the realities that people are faced with.

We’ve had some interest in the idea of researchers being able to request digital copies of archives through the Hub. That is, a researcher comes across an archive they would like to see, and they would like digital copies, so they indicate this in some way. Not yet fully thought out, but again, we’d need to know if there is a need for this. How many officers are starting to digitise on demand?

Finally, we’re covering music, dance, plants, medicine and the Middle East with our latest contributors. Check out who is recently on board on our contributors’ page:
http://archiveshub.ac.uk/contributors/

A Web of Possibilities

July 28, 2011 / Jane Stevenson

“Will you browse around my website”, said the spider to the fly,
‘Tis the most attractive website that you ever did spy”

All of us want to provide attractive websites for our users. Of course, we’d like to think its not really the spider/fly kind of relationship! But we want to entice and draw people in and often we will see our own website as our key web presence; a place for people to come to to find out about who we are, what we have and what we do and to look at our wares, so to speak.

The recently released ‘Discovery’ vision is to provide UK researchers with “easy, flexible and ongoing access to content and services through a collaborative, aggregated and integrated resource discovery and delivery framework which is comprehensive, open and sustainable.” Does this have any implications for the institutional or small-scale website, usually designed to provide access to the archives (or descriptions of archives) held at one particular location?

Over the years that I’ve been working in archives, announcements about new websites for searching the archives of a specific institution, or the outputs of a specific project have been commonplace. A website is one of the obvious outputs from time-bound projects, where the aim is often to catalogue, digitise or exhibit certain groups of archives held in particular repositories. These websites are often great sources of in-depth information about archives. Institutional websites are particularly useful when a researcher really wants to gain a detailed understanding of what a particular repository holds.

However, such sites can present a view that is based more around the provider of the information rather than the receiver. It could be argued that a researcher is less likely to want to use the archives because they are held at a particular location, apart from for reasons of convenience, and more likely to want archives around their subject area, and it is likely that the archives which are relevant to them will be held in a whole range of archives, museums and libraries (and elsewhere). By only looking at the archives held at a particular location, even if that location is a specialist repository that represents the researcher’s key subject area, the researcher may not think about what they might be missing.

Project-based websites may group together archives in ways that benefit researchers more obviously, because they are often aggregating around a specific subject area. For example, making available the descriptions and links to digital archives around a research topic. Value may be added through rich metadata, community engagement and functionality aimed at a particular audience. Sometimes the downside here is the sustainability angle: projects necessarily have a limited life-span, and archives do not. They are ever-changing and growing and descriptions need to be updated all the time.

So, what is the answer? Is this too much of a silo-type approach, creating a large number of websites, each dedicated to a small selection of archives?

Broader aggregation seems like one obvious answer. It allows for descriptions of archives (or other resources) to be brought together so that researchers have the benefit of searching across collections, bringing together archives by subject, place, person or event, regardless of where they are held (although there is going to be some kind of limit here, even if it is at the national level).

You might say that the Archives Hub is likely to be in favour of aggregation! But it’s definitely not all pros and no cons. Aggregations may offer a powerful search functionality for intellectually bringing together archives based on a researcher’s interests, but in some ways there is a greater risk around what is omitted. When searching a website that represents one repository, a researcher is more likely to understand that other archives may exist that are relevant to them. Aggregations tend to promote themselves as comprehensive – if not explicitly then implicitly – which this creates expectation that cannot ever fully be met. They can also raise issues around measuring impact and around licensing. There is also the risk of a proliferation of aggregation services, further confusing the resource discovery landscape.

Is the ideal of broad inter-disciplinary cross-searching going to be impeded if we compete to create different aggregations? Yes, maybe it will be to some extent, but I think that it is an inevitability, and it is valid for different gateways to service different audiences’ needs. It is important to acknowledge that researchers in different disciplines and at different levels have their own needs, their own specific requirements, and we cannot fulfill all of these needs by only presenting data in one way.

One thing I think is critical here is for all archive repositories to think about the benefits of employing recognised and widely-used standards, so that they can effectively interoperate and so that the data remains relevant and sustainable over time. This is the key to ensuring that data is agile, and can meet different needs by being used in different systems and contexts.

I do wonder if maybe there is a point at which aggregations become unwieldy, politically complicated and technically challenging. That point seems to be when they start to search across countries. I am still unsure about whether Europeana can overcome this kind of problem, although I can see why many people are so keen on making it work. But at present, it is extremely patchy, and , for example, getting no results for texts held in Britain relating to Shakespeare is not really a good result. But then, maybe the point is that Europeana is there for those that want to use it, and it is doing ground-breaking work in its focus on European culture; the Archives Hub exists for those interested in UK Archives and a more cross-disciplinary approach; Genesis exists for those interested in womens studies; for those interested in the Co-operative movement, there is the National Co-operative Archive site; for those researching film, the British Film Institute website and archive is of enormous value.

So, is the important principle here that diversity is good because people are diverse and have diverse needs? Probably so. But at the same time, we need to remember that to get this landscape, we need to encourage data sharing and avoid duplication of effort. Once you have created descriptions of your archive collections you should be able to put them onto your own website, contribute them to a project website, and provide them to an aggregator.

Ideally, we would be looking at one single store of descriptions, because as soon as you contribute to different systems, if they also store the data, you have version control issues. The ability to remotely search different data sources would seem to be the right solution here. However, there are substantial challenges. The Archives Hub has been designed to work in a distributed way, so that institutions can host their own data. The distributed searching does present challenges, but it certainly works pretty well. The problem is that running a server, operating system and software can actually be a challenge for institutions that do not have the requisite IT skills dedicated to the archives department. Institutions that hold their own data have it in a great variety of formats. So, what we really need is the ability for the Archives Hub to seamlessly search CALM, AdLib, MODES, ICA AtoM, Access, Excel, Word, etc. and bring back meaningful results. Hmmm….

The business case for opening up data seems clear. Project like Open Bibliographic Data have helped progress the thinking in this arena and raised issues and solutions around barriers such as licensing. But it seems clear that we need to understand more about the benefits of aggregation, and the different approaches to aggregation, and we need to get more buy-in for this kind of approach. Does aggregation allow users to do things that they could not do otherwise? Does it save them time? Does it promote innovation? Does it skew the landscape? Does it create problems for institutions because of the problems with branding and measuring impact? Furthermore, how can we actually measure these kinds of potential benefits and issues?

Websites that offer access to archives (or descriptions of archives) based on where they are located and based on they body that administers them have an important role to play. But it seems to me that it is vital that these archives are also represented on a more national, and even international stage. We need to bring our collections to where the users are. We need to ensure that Google and other search engines find our descriptions. We need to put archives at the heart of research, alongside other resources.

I remember once talking about the Archives Hub to an archivist who ran a specialist repository. She said that she didn’t think it was worth contributing to the Hub because they already had their own catalogue. That is, researchers could find what they wanted via the institute’s own catalogue on their own system, available in their reading room. She didn’t seem to be aware that this could only happen if they knew that the archive was there, and that this view rested on the idea that researchers would be happy to repeat that kind of search on a number of other systems. Archives are often about a whole wealth of different subjects – we all know how often there are unexpected and exciting finds. A specialist repository for any one discipline will have archives that reach way beyond that discipline into all sorts of fascinating areas.

It seems undeniable that data is going to become more open and that we should promote flexible access through a number of discovery routes, but this throws up challenges around version control, measuring impact, brand and identity. We always have to be cognisant of funding, and widely disseminated data does not always help us with a funding case because we lose control of the statistics around use and any kind of correlation between visits to our website and bums on seats. Maybe one of the challenges is therefore around persuading top-level managers and funders to look at this whole area with a new perspective?

HubbuB: July 2011

July 21, 2011 / Jane Stevenson

Diary of the Archives Hub, July 2011

Contributor Forum

We had a forum this month that included both Contributors’ Forum members and Steering Committee members. It was a really useful and productive morning. The write-up from this can be found on our blog: http://archiveshub.ac.uk/blog/?p=2677. For me and Joy, this kind of feedback is invaluable in helping us to plan for the future, and we are very appreciative of those who came along and participated.

Linking Lives: a Linked Data project

You will be pleased to hear that we secured funding for an enhancements project, called ‘Linking Lives’. This project aims to work with our Linked Data output from Locah to create a names-based user interface, with links to other data sources. All will become clear as I start to set this out and blog about it. We showed a mock-up of the sort of interface that we want to create to the Forum, and it was well received. We’re very excited about this project, because it really does enable us to start to think about presenting archival descriptions in a new way, and integrating them much more closely with other data sources.

Feature for July

We are pleased to say that the Victoria and Albert Museum Theatre and Performance Collections are now contributing to the Hub and this month we feature their wonderful collections along with some great images: http://archiveshub.ac.uk/features/theatreperformancecollections/

Content negotiation

You now have ability to retrieve records as XML or text files simply by adding the requisite extension to the persistent URI, e.g.

http://archiveshub.ac.uk/data/gb029ms207.xml
http://archiveshub.ac.uk/data/gb029ms207.txt

This may not be immediately useful to your average user, but it is working towards the idea of flexible access for different uses, thinking beyond the traditional web-based interface. It certainly helps me, as I often want to check the encoding behind the descriptions!

Browser Plugin

We now have a simple plugin to search the Archives Hub. It enables the Hub to be searched via the search box in the top right of the browser, providing another means of access to the Hub. If you go to the Hub homepage, you can see the drop-down list of search plug-ins available and you will have the opportunity to add ‘Archives Hub’. This is indicated by blue highlighting on the drop-down arrow.

Reference and Former Reference

We’ve had quite a bit of difficulty with how to deal with records that include both a reference, and a ‘former reference’. These are generally from CALM. We have found that for some contributors the ‘former reference’ is exactly that, but for others it is actually the reference they want to use. We therefore feel that the only option is to display both references on the Hub. If any contributor would like us to globally edit records to remove one of the references, we can do that for you. For example: http://archiveshub.ac.uk/data/gb0370pp1. We hope that this works for people. If it doesn’t, we can gather feedback and consider a different approach.