Archives Portal Europe Country Managers’ Meeting, 30 Nov 2016

This is a report of a meeting of the Archives Portal Europe Country Managers’ in Slovakia, 30 November 2016, with some comments and views from the UK and Archives Hub perspective.

APE-CMmeeting-30Nov2016
APE Country Managers meeting, Bratislava, 30 Nov 2016

Context

The APE Foundation (APEF), which was created following the completion of the APEx project (an EC funded project to maintain and develop the portal running from 2012 to 2015), is now taking APE forward. It has a Governing Board and working groups for standards, technical issues and PR/comms. The APEF has a coordinator and three technical/systems staff as well as an outreach officer. Institutions are invited to become associate members, to help support the portal and its aims.

Things are going well for APEF, with a profit recorded for 2016, and growing associate membership. APEF continues to be busy with development of APE, and is endeavouring to encourage cooperation and collaboration as a means to seize opportunities to keep developing and to take advantage of EU funding opportunities.

Current Development

The APEF has the support of Ministry of Culture in the Netherlands and has a close working relationship with the Netherlands national aggregation project, the ‘DTR’, which is key to the current APE development phase. The idea is to use the framework of APE for the DTR, benefitting both parties. Cooperation with DTR involves three main areas:

•    building an API to open up the functionality of APE to third parties (and to enable the DTR to harvest the APE data from The Netherlands)
•    improving the uploading and processing of EAC-CPF
•    enabling the uploading and processing of ‘additional finding aids’

The API has been developed so that specific requests can be sent to fetch selected data. It is possible to do this for EAD (descriptions) and EAC-CPF (names).  The API provides raw data as well as processed results.  There have been issues around things like relevance of ordering of results which is a substantial area of work that is being addressed.

The API raises implications in terms of the data, as the Content Provider Agreement that APE institutions sign gives control of the data to the contributors. So, the API had to be implemented in a way that enables each contributor to give explicit permission for the data to be available as CC0 (fully open data). This means that if a third party uses the API to grab data, they only get data from a country that has given this permission. APEF has introduced an API key, which is a little controversial, as it could be argued that it is a barrier to complete openness, but it does enable the Foundation to monitor use, which is useful for impact, for checking correct use, and blocking those who misuse the API. This information is not made open, but it is stored for impact and security purposes.

There was some discussion at the meeting around open data and use of CC0. In countries such as Switzerland it is not permitted to open up data through a CC0 licence, and in fact, it may be true to say that CC0 is not the appropriate licence for archival descriptions (the question of whether any copyright can exist in them is not clear) and a public domain licence is more appropriate. When working across European countries there are variations in approaches to open data. The situation is complicated because the application of CC0 for APE data is not explicit, so any licence that a country has attached to their data will effectively be exported with the data and you may get a kind of licence clash. But the feeling is that for practical purposes if the data is available through an API, developers will expect it to be fully open and use it with that in mind.

There has been work to look at ways to take EAC-CPF from a whole set of institutions more easily, which would be useful for the UK, where we have many EAC-CPF descriptions created by SNAC.  Work on any kind of work to bring more than one name description for the same person together has not started, and is not scheduled for the current period of development, but the emphasis is likely to be on better connectivity between variations of a name rather than having one description per name.

Additional finding aids offer the opportunity to add different types of information to APE. You may, for example, have a register of artists or ships logs, you may have started out with a set of cards with names A-Z, relating to your archive in some way.  You could describe these in one EAD description, and link this to the main description. In the current implementation of EAD2002 in APE this would have to go into a table in Scope & Content and in-line tagging is not allowed to identify parts of the data. This leads to limitations with how to search by name. But then EAD3 gives the option to add more information on events and names. You can divide a name up into parts, which allows for better searching.  Therefore APE is developing a new means to fetch and process EAD3 for the additional finding aids alongside EAD2002 for ‘standard’ finding aids. In conjunction with this, the interface needs to be changed to present the new names within the search.

The work on additional finding aids may not be so relevant for the Archives Hub as a contributor to APE, as the Hub cannot look at taking on ‘other finding aids’, with all the potential variations that implies. However, institutions could potentially log into APE themselves and upload these different types of descriptions.

APE and Europeana

There was quite a bit to talk about concerning APE and Europeana. The APEF is a full partner of the Europeana Digital Services Infrastructure 2 (DSI2) project (currently running 2016/2017). The project involves work on the structure for Europeana, maintaining and running data and aggregation services, improving data quality, and optimising relations with data partners. The work APE is involved with includes improving the current workflow for harvest/ingest of data, and also evaluating what has already been ingested into Europeana.

Europeana seems to have ongoing problems dealing with multi-level EAD descriptions, compounded by the limitation that they only represent  digital materials. The approach is not a good fit for archives. Europeana have also introduced both a new publishing framework and different rights statements.

The new publishing framework is a 4 tier approach where you can think of Europeana as a more basic tool for promoting your archives, or something that is a platform for reuse. It refers to the digital materials in terms of whether they are a certain number of pixels, e.g. 800 pixels wide for thumbnails (adding thumbnails means using Europeana as a ‘showcase’) and 1,200 pixels wide ( high quality and reusable, using Europeana as a distribution and reuse platform). The idea of trying to get ‘quality’ images seems good, but in practice I wonder if it simply raises the barrier too much.

The new Rights statements require institutions to be very clear about the rights they want to apply to digital content.  The likely conclusion of all this from the point of view of the Archives Hub is that we cannot grapple with adding to Europeana on behalf of all of our contributors, and therefore individual contributors will have to take this on board themselves. It will be possible for contributors to log into the APE dashboard (when it has been changed to reflect the Europeana new rights) and engage with this, selecting the finding aids, the preferred rights statements, and ensuring that thumbnail and reusable images meet the requirements.  One the descriptions are in APE they can then be supplied to Europeana. The resulting display in Europeana should be checked, to ensure that it is appropriate.

We discussed this approach, and concluded that maybe APE contributors could see Europeana as something that they might use to showcase their content, so, think of it on our terms, as archives, and how it might help us. There is no obligation to contribute, so it is a case of making the decision whether it is worth representing the best visual archives through Europeana or whether this approach takes more effort than the value that we get out of it.  After 10 years of working with Europeana, and not really getting proper representation of archives, the idea of finding a successful way of contributing archives is appealing, but it seems to me that the amount of effort required is going to be significant, and I’m not sure if the impact is enough to warrant it.

Europeana are working on a new way of automated and real time ingest from aggregators and content providers, but this may take another year or more to become fully operational.

Outreach and CM Reports

Towards the end of the day we had a presentation from the new PR/communicaitons officer. Having someone to encourage, co-ordinate and develop ideas for dissemination should provide invaluable for APE. The Facebook page is full of APE activities and related news and events. You can tweet and use the hashtag #archivesportaleurope if you would like to make APE aware of anything.

We ended the day with reports from country managers, which, as always threw up many issues, challenges, solutions, questions and answers. Plenty to set up APEF for another busy year!

Save

Save

Archives Portal Europe builds firm foundations

On 8th June 2016 I attended the first Country Manager’s meeting of the newly formed Foundation of the Archives Portal Europe (APEF) at the National Archives of the Netherlands (Nationaal Archief).

The Foundation has been formed on the basis of partnerships between European countries. The current Foundation partners are: Belgium, Denmark, Luxembourg, The Netherlands, Spain, Sweden, Switzerland, Estonia, France, Germany, Hungary, Italy, Latvia, Norway and Slovenia. All of these countries are members of the ‘Assembly of Associates’. Negotiations are proceeding with Bulgaria, Greece, Liechtenstein, Lithuania, Malta, Poland, Slovakia and the UK. Some countries are not yet in a position to become members, mainly due to financial and administrative issues, but the prospects currently look very positive, with a great willingness to take the Portal forwards and continue the valuable networking that has been built up over the past decade. Contributing to the Portal does not incur financial contribution; the Assembly of Associates is separate from this, and the idea is that countries (National Archives or bodies with an educational/research remit) sign up to the principles of APE and the APE Foundation – to collaborate and share experiences and ideas, and to make European archives as accessible as possible.

The Governing Board of the Foundation is working with potential partners to reach agreements on a combination of financial and in-kind contributions. It’s also working on long term strategy documents. It has established working groups for Standards and PR & Communications and it has set up cooperation with the Dutch DTR project (Digitale Taken Rijksarchieven / Digital Processes in State Archives) and with Europeana. The cooperation with the DTR project has been a major boost, as both projects are working towards similar goals, and therefore work effort can be shared, particularly development work.

Current tasks for the APEF:

  • Building an API to open up the functionality of the Archives Portal Europe to third parties and to implement the possibility for the content providers to switch this option on or off in the Archives Portal Europe’s back-end.
  • Improving the uploading and processing of EAC-CPF records in the Archives Portal Europe and improving the way in which records creators’ information can be searched and found via the Archives Portal Europe’s front-end and via the API.
  • Enabling the uploading/processing of “additional finding aids (indexes)” in the Archives Portal Europe and making this additional information available via the Archives Portal Europe’s front-end and the API.

The above in addition to the continuing work of getting more data into the Portal, supporting the country managers in working with repositories, and promoting the portal to researchers interested in using European-wide search and discovery tool.

APEF will be a full partner in the Europeana DSI2 project, connecting the online collections of Europe’s cultural heritage institutions, which will start after the summer and will run for 16 months. Within this project APEF will focus on helping Europeana to develop the aggregation structure and provide quality data from the archives community to Europeana. A focus on quality will help to get archival data into Europeana in a way that works for all parties. There seems to be a focus from Europeana on the ‘treasures’ from the archives, and on images that ‘sell’ the archives more effectively. Whatever the rights and wrongs of this, it seems important to continue to work to expose archives through as many channels as we can, and for us in the UK, the advantages of contributing to the Archives Hub and thence seamlessly to APE and to Europeana, albeit selectively, are clear.

A substantial part of the meeting was dedicated to updates from countries, which gave us all a chance to find out what others are doing, from the building of a national archives portal in Slovakia to progress with OAI-PMH harvesting from various systems, such as ScopeArchiv, used in Switzerland and other countries. Many countries are also concerned with translations of various documents, such as the Content Provider Agreement, which is not something the UK has had to consider (although a Welsh translation would be a possibility).

We had a session looking at some of the more operational and functional tasks that need to be thought about in any complex system such as the APE system. We then had a general Q&A session. It was acknowledged that creating EAD from scratch is a barrier to contributing for many repositories. For the UK this is not really an issue, because we contribute Archives Hub descriptions. But of course it is an issue for the Hub: to find ways to help our contributors provide descriptions, especially if they are using a proprietary system. Our EAD Editor accounts for a large percentage of our data, and that creates the EAD without the requirement of understanding more than a few formatting tags.

The Archives Hub aims to set up harvesting of our contributors’ descriptions over the next year, thus ensuring that any descriptions contributed to us will automatically be uploaded to the Archives Portal Europe. (We currently have to upload on a per-contributor basis, which is not very efficient with over 300 contributors). We will soon be turning our attention to the selective digital content that can be provided by APE to Europeana. That will require an agreement from each institution in terms of the Europeana open data licence. As the Hub operates on the principles of open data, to encourage maximum exposure of our descriptions and promote UK archives, that should not be a problem.

With thanks to Wim van Dongen, APEF country manager coordinator / technical coordinator, who provided the minutes of the Country Managers’ meeting, which are partially reproduced here.

Europeana Tech 2015: focus on the journey

Last week I attended a very full and lively Europeana Tech conference. Here are some of the main initiatives and ideas I have taken away with me:

Think in terms of improvement, not perfection

Do the best you can with what you have; incorrect data may not be as bad as we think and maybe users expectations are changing, and they are increasingly willing to work with incomplete or imperfect data. Some of the speakers talked about successful crowd-sourcing – people are often happy to correct your metadata for you and a well thought-out crowd-sourcing project can give great results.

BL Georeferencer, showing an old map overlaying part of Manchester: http://www.bl.uk/maps/georeferencingmap.html
BL Georeferencer, showing an old map overlaying part of Manchester: http://www.bl.uk/maps/georeferencingmap.html

The British Library currently have an initiative to encourage tagging of their images on Flickr Commons and they also have a crowd-sourcing geo-referencer project.

The Cooper Hewitt Museum site takes a different and more informal approach to what we might usually expect from a cultural heritage site. The homepage goes for an honest approach:

“This is a kind of living document, meaning that development is ongoing — object research is being added, bugs are being fixed, and erroneous terms are being revised. In spite of the eccentricities of raw data, you can begin exploring the collection and discovering unexpected connections among objects and designers.”

The ‘here is some stuff’ and ‘show me more stuff’ type of approach was noticeable throughout the conference, with different speakers talking about their own websites. Seb Chan from the Cooper Hewitt Museum talked about the importance of putting information out there, even if you have very little, it is better than nothing (e.g. https://collection.cooperhewitt.org/objects/18446665).

The speaker from Google, Chris Welty, is best known for his work on ontologies in the Semantic Web and IBM’s Watson. He spoke about cognitive computing, and his message was ‘maybe it’s OK to be wrong’. Something may well still useful, even if it is not perfectly precise. We are increasingly understanding that the Web is in a state of continuous improvement, and so we should focus on improvement, not perfection. What we want is for mistakes to decrease, and for new functionality not to break old functionality.  Chris talked about the importance of having a metric – something that is believable – that you can use to measure improvement. He also spoke about what is ‘true’ and the need for a ‘ground truth’ in an environment where problems often don’t have a right or wrong answer. What is the truth about an image? If you show an image to a human and ask them to talk about it they could talk for a long time. What are the right things to say about it? What should a machine see? To know this, or to know it better, Chris said, Google needs data – more and more and more data. He made it clear that the data is key and it will help us on the road to continuous improvement. He used the example of searching for pictures of flowers using Google to find ‘paintings with flowers’. If you did this search 5 years ago you probably wouldn’t get just paintings with flowers. The  search has improved, and it will continue to improve.  A search for ‘paintings with tulips’ now is likely to show you just tulips. However, he gave the example of  ‘paintings with flowers by french artists’ –  a search where you start to see errors as the results are not all by french artists. A current problem Google are dealing with is mixed language queries, such as  ‘paintings des fleurs’, which opens a whole can of worms. But Chris’ message was that metadata matters: it is the metadata that makes this kind of searching possible.

The Success of Failure

Related to the point about improvement, the message is that being ‘wrong’ or ‘failing’ should be seen in a much more positive light. Chris Welty told us that two thirds of his work doesn’t make it into a live environment, and he has no problem with that. Of course, it’s hard not to think that Google can afford to fail rather more than many of us! But I did have an interesting conversation with colleagues, via Twitter, around the importance of senior management and funders understanding that we can learn a great deal from what is perceived as failure, and we shouldn’t feel compelled to hide it away.

Photo from Europeana Tech
Europeana Tech panel session, with four continents represented

Think in terms of Entities

We had a small group conversation where this came up, and a colleague said to me ‘but surely that’s obvious’. But as archivists we have always been very centered on documents rather than things – on the archive collection, and the archive collection description. The  trend that I was seeing reflected at Europeana Tech continued to be towards connections, narratives, pathways, utilising new tools for working with data, for improving data quality and linking data, for adding geo-coordinates and describing new entities, for making images more interoperable and contextualising information. The principle underlying this was that we should start from the real world – the real world entities – and go from there. Various data models were explored, such as the Europeana Data Model and CIDOC CRM, and speakers explained how entities can connect, and enable a richer landscape. Data models are a tricky one because they can help to focus on key entities and relationships, but they can be very complex and rather off-putting. The EDM seems to split the crowd somewhat, and there was some criticism that it is not event-based like CIDOC CRM, but the CRM is often criticised for being very complex and difficult to understand. Anyway, setting that aside, the overall the message was that relationships are key, however we decide to model them.

Cataloguing will never capture everyone’s research interests

An obvious point, but I thought it was quite well conveyed in the conference. Do we catalogue with the assumption that people know what they need? What about researchers interested in how ‘sad’ is expressed throughout history, or fashions for facial hair, or a million other topics that simply don’t fit in with the sorts of keywords and subject terms we normally use. We’ll never be able to meet these needs, but putting out as much data as we can, and making it open, allows others to explore, tag and annotate and create infinite groups of resources. It can be amazing and moving, what people create: Every3Minutes.

There’s so much out there to explore….

There are so many great looking tools and initiatives worth looking at, so many places to go and experiment with open data, so many APIs enabling so much potential. I ended up with a very long list of interesting looking sites to check out. But I couldn’t help feeling that so few of us have the time or resource to actually take advantage of this busy world of technology. We heard about Europeana Labs, which has around 100 ‘hardcore’ users and 2,200 registered keys (required for API use). It is described as “a playground for remixing and using your cultural and scientific heritage. A place for inspiration, innovation and sharing.” I wondered if we would ever have the time to go and have a play. But then maybe we should shift focus away from not being able to do these things ourselves, and simply allow others to use the data, and to adopt the tools and techniques that are available – people can create all sorts of things. One example amongst many we heard about at the conference is a cultural collage: zenlan.com/collage. It comes back to what is now quite an old adage, ‘the best innovation may not be done by you’. APIs enable others to innovate, and what interests people can be a real surprise. Bill Thompson from the BBC referred to a huge interest in old listings from Radio Times, which are now available online.

The International Image Interoperability Framework

I list the IIIF this because it jumped out at me as a framework that seems to be very popular – several speakers referred to it, and it very positive terms. I hadn’t heard of it before, but it seemed to be seen as a practical means to ensure that images are interoperable, and can be moved around different systems.

Think Little

One of my favourite thoughts from the conference, from the ever-inspirational Tim Sherratt, was that big ideas should enable little ideas. The little ideas are often what really makes the world go round. You don’t have to always think big. In fact, many sites have suffered from the tendency to try to do everything. Just because you can add tons of features to your applications, it doesn’t mean you should

The Importance of Orientation

How would you present your collections if you didn’t have a search box? This is the question I asked myself after listening to George Oates, from Good Form and Spectacle. She is a User Interface expert, and has worked on Flickr and for the Internet Archive amongst other things. I thought her argument about the need to help orientate users was interesting, as so often we are told that the ‘Google search box’ is the key thing, and what users expect. She talked about some of her experiments with front end interfaces that allow users to look at things differently, such as the V&A Spelunker. She spoke in terms of landmarks and paths that users could follow. I wonder if this is easier said than done with archives without over-curating what you have or excluding material that is less well catalogued, or does not have a nice image to work with. But I certainly think it is an idea worth exploring.

View of V&A Speleunker
“The V&A Spelunker is a rough thing built by Good, Form & Spectacle to give a different view into the collection of the Victoria & Albert Museum”

Online Survey Results (2011)

We would like to share some of the results of our annual online survey, which we run each year, over a 3-4 week period. We aim for about 100 responses (though obviously more would be very welcome!), and for this survey we got 92 responses. We create a pop-up invitation to fill out the survey – something we do not like to do, but we do feel that it attracts more responses than a simple link.

Context

We have a number of questions that are replicated in surveys run for Zetoc and Copac, two bibliographic JISC-funded Mimas services, and this provides a means to help us (and our funders) look at all three services together and compare patterns of use and types of user.

This year we added four questions specifically designed to help us with understanding users of the Hub and to help us plan our priorities.

We aim to keep the number of questions down to about 12 at the most, and ensure that the survey will take no longer than 10 minutes to complete. But we also want to provide the opportunity for people to spend longer and give more feedback if they wish, so we combine tick lists and radio boxes with free text comments boxes.

We take the opportunity to ask whether participants would be willing to provide more feedback for us, and if they are potentially willing, they provide their email address. This gives us the opportunity to ask them to provide more feedback, maybe by being part of a focus group.

Results of the Survey

Profile

  • The vast majority of respondents (80%) are based in the UK for their study and/or work.
  • Most respondents are in the higher education sector (60%). A substantial number are in the Government sector and also the heritage/museum sector.
  • 20% of those using the Hub are students – maybe less than we would hope, but a significant number.
  • 10% are academics – again, less than we would hope, but it may be that academics are less willing to fill in a survey.
  • 50% are archivists or other information professionals. This is a high number, but it is important to note that it includes use of the Hub on behalf of researchers, to answer their enquiries, so it could be said to represent indirect use by researchers.
  • The majority of respondents use the service once or twice a month, although usage patterns were spread over all options, from daily to less than once a month, and it is difficult to draw conclusions from this, as just one visit to the Hub website may prove invaluable for research.

graph showing value of the HubUse and Recommendation

  • A significant percentage – 26% – find the Hub ‘neither easy nor difficult’ to use, and 3% of the respondents found it difficult to use, indicating that we still need to work on improving usability (although note that a number of comments were positive about ease of use) .
  • 73% agree their work would take longer without the Hub, which is a very positive result and shows how important it is to be able to cross-search archives in this way.
  • A huge majority – 93% – would recommend the Hub to others, which is very important for us. We aim to achieve 90% positive in this response, as we believe that recommendations are a very important means for the Hub to become more widely known.

Subject Areas

We spent a significant amount of time creating a list of subjects that would give us a good indication of disciplines in which people might use the Hub. The results were:

    • History 47
    • Library & Archive Studies 33
    • English Literature 17
    • Creative & Performing Arts 16
    • Education & Research Methods 10
    • Predominantly Interdisciplinary 9
    • Geography & Environment 5
    • Political Studies & International Affairs 5
    • Modern Languages and Linguistics 4
    • Physical Sciences 4
    • Special Collections 4
    • Architecture & Planning 3
    • Biological & Natural Sciences 3
    • Communication & Media Studies 3
    • Medicine 3
    • Theology & Philosophy 3
    • Archaeology 2
    • Engineering 2
    • Psychology & Sociology 2
    • Agriculture 1
    • Law 1
    • Mathematics 1
    • Business & Management Studies 0
  • History is, not surprisingly, the most common discipline, but literature, the arts, education and also interdisciplinary work all feature highly.
  • There is a reasonable amount of use from the subjects that might be deemed to have less call for archives, showing that we should continue to promote the Hub in these areas and that archives are used in disciplines where they do not have a high profile. It would be very valuable to explore this further.

graph showing use of archival websites

  • The Hub is often used along with other archival websites, particularly The National Archives and individual record office websites, but a significant number do not use the websites listed, so we cannot assume prior knowledge of archives.
  • It would be interesting to know more about patterns of use. Do researchers try different websites, and in what order to they visit them? Do they have a sense of what the different sites offer?
  • There is still low use of the European aggregators, Europeana and APENet, although at present UK archives are not well represented on these services and arguably they do not have a high profile amongst researchers (the Hub is not yet represented on these aggregators).

Subsequent activities

  • It is interesting to note that 32% visit a record office as a result of using the Hub, but 68% do not. It would be useful to explore this further, to understand whether the use of the Hub is in itself enough for some researchers. We do know that for some people, the description holds valuable information in and of itself, but we don’t know whether the need to visit a record office, maybe some distance away, prevents use of the archives when they might be of value to the researcher.

What is of most value?

  • We asked about what is important to researchers, looking at key areas for us. The results show that comprehensive coverage still tops the polls, but detailed descriptions also continue to be very important to researchers, somewhat in opposition tograph showing what is most valuable to researchers the idea of the ‘quick and dirty’ approach. More sophisticated questioning might draw out how useful basic descriptions are compared with no description and what sort of level of detail is acceptable.
  • Links to digital content and information on related material are important, but not as important as adding more descriptions and providing a level of detail that enables researchers to effectively assess archives.
  • Searching across other cultural heritage resources at the same time is maybe surprisingly less of a priority than content and links. It is often assumed that researchers want as much diverse information as possible in a ‘one-stop shop’ approach, but maybe the issues with things like the usability of the search,  navigation, number of results and relevance ranking of results illustrate one of the main issues – creating a site that holds descriptions and links to very varied content and still ensuring it is very easily understandable and researchers know what they are getting.
  • The regional search was not a high priority but a significant medium priority, and it might be argued that not all researchers would be interested in this, but some would find it particularly useful, and many archivists would certainly find it helpful in their work
  • We provided a free text box for participants to say what they most valued. The ability to search across descriptions, which is the most basic value proposition of the Hub, came out top, and breadth of coverage was also popular, and could be said to be part of the same selling point.
  • It was interesting to see that some respondents cited the EAD Editor as the main strength for them, showing how important it is to provide ways for archivists to create descriptions (it may be thought that other means are at their disposal, but often this is not the case).
  • Six people referred to the importance of the Hub for providing an online presence, indicating that for some record offices, the Hub is still the only way that collections are surfaced on the Web.

What would most improve the Hub?

  • We had a diversity of responses to the question about what would most improve the Hub, maybe indicating that there are no very obvious weaknesses, which is a good thing. But this does make it difficult for us to take anything constructive from the answers, because we cannot tell whether there is a real need for a change to be made. However, there were a few answers that focused on the interface design, and some of these issues should be addressed by our new ‘utility bar’ which is a means to more clearly separate the description from the other functions that users can then perform, and should be implemented in the next six months.

Conclusions

The survey did not throw up anything unexpected, so it has not materially affected our plans for development of the Hub. But it is essentially an endorsement of what we are doing, which is very positive for us. It emphasised the importance of comprehensive coverage, which is something we are prioritising, and the value of detailed descriptions, which we facilitate through the EAD Editor and our training opportunities and online documentation. Please contact us if you would like to know more.

HubbuB: October 2011

Europeana and APENet

Europeana LogoI have just come back from the Europeana Tech conference, a 2 day event on various aspects of Europeana’s work and on related topics to do with data. The big theme was ‘open, open, open’, as well, of course, as the benefits of a European portal for cultural heritage.  I was interested to hear about Europeana’s Linked Data output, but my understanding is that at present, we cannot effectively link to their data, because they don’t provide URIs  for concepts. In other words, identifiers for names such as http://data.archiveshub.ac.uk/doc/agent/gb97/georgebernardshaw, so that we can say, for example, that our ‘George Bernard Shaw’ is the same as ‘George Bernard Shaw’ represented on Europeana.

I am starting to think about the Hub being part of APENet and Europeana. APENet is the archival aggregator for Europe. I have been in touch with them about the possibility of contributing our data, and if the Hub was to contribute, we could probably start from next year. Europeana only provide metadata for digital content, so we could only supply descriptions where the user can link to the digital content, but this may well be worth doing, as a means to promote the collections of any Hub contributors who do link to digital materials.

If you are a contributor, or potential contributor, we would like to know what you think…. we have a quick question for you at http://polldaddy.com/poll/5565396/. It simply asks if you think its a good idea to be part of these European initiatives. We’d love to get your views, and you only have to leave your name and a comment if you want to.

Flickr: an easy way to provide images online

You will be aware that contributors can now add images to descriptions and links to digital content of all kinds. The idea is that the digital content then forms an integral whole with the metadata, and it is also interoperable with other systems.

I’ve just seen an announcement by the University of Northampton, who have recently added materials to Flickr . I know that many contributors struggle to get server space to put their digital content online, so this is one possible option, and of course it does reach a huge number of people this way. There may be risks associated with the persistence of the URIs for the images, but then that is the case wherever you put them.

On the Hub we now have a number of images and links to content, for example: http://archiveshub.ac.uk/data/gb1089ukc-joh, http://archiveshub.ac.uk/data/gb1089ukc-bigwood, http://archiveshub.ac.uk/data/gb1089ukc-wea, http://archiveshub.ac.uk/data/gb141boda?page=7#boda.03.03.02.

Ideally, contributors would supply digital content at item level, so the metadata is directly about the image/digital content, but it is fine to provide it at any level that is appropriate.  The EAD Editor makes adding links easy (http://archiveshub.ac.uk/dao/). If you aren’t sure what to do, please do email us.

Preferred Citation

We never had the field for the preferred citation in our old template for the creation of EAD, and it has not been in the EAD Editor up till now. We were prompted to think about this after seeing the results of a survey on the use of EAD fields presented at the Society of American Archivists conference. Around 80% of archive institutions do use it. We think it’s important to advise people how to cite the archive, so we are planning to provide this in the Editor and may be able to carry out global edits to add this to contributors’ data.

List of Contributors

Our list of contributors within the main search page has now been revised, and we hope it looks substantially more sensible, and that it is better for researchers. This process really reminded us how hard it is to come up with one order for institutions that works for everyone!  We are currently working on a regional search, something that will act as an alternative way to limit searching. We hope to introduce this next year.

And finally…A very engaging Linked Data interface

This interface demonstration by Tim Sherratt shows how something driven by Linked Data can really be very effective. It also uses some of the Archives Hub vocabulary from our own Linked Data work, which is a nice indication of how people have taken notice of what we have been doing. There is a great blog post about it by Pete Johnston, Storytelling, archives and Linked Data. I agree with Pete that this sort of work is so exciting, and really shows the potential of the Linked Data Web for enabling individual and collective storytelling…something we, as archivists, really must be a part of.

A Web of Possibilities

“Will you browse around my website”, said the spider to the fly,image of spider from Wellcome images
‘Tis the most attractive website that you ever did spy”

All of us want to provide attractive websites for our users. Of course, we’d like to think its not really the spider/fly kind of relationship! But we want to entice and draw people in and often we will see our own website as our key web presence; a place for people to come to to find out about who we are, what we have and what we do and to look at our wares, so to speak.

The recently released ‘Discovery’ vision is to provide UK researchers with “easy, flexible and ongoing access to content and services through a collaborative, aggregated and integrated resource discovery and delivery framework which is comprehensive, open and sustainable.”  Does this have any implications for the institutional or small-scale website, usually designed to provide access to the archives (or descriptions of archives) held at one particular location?

Over the years that I’ve been working in archives, announcements about new websites for searching the archives of a specific institution, or the outputs of a specific project have been commonplace.  A website is one of the obvious outputs from time-bound projects, where the aim is often to catalogue, digitise or exhibit certain groups of archives held in particular repositories. These websites are often great sources of in-depth information about archives. Institutional websites are particularly useful when a researcher really wants to gain a detailed understanding of what a particular repository holds.

However, such sites can present a view that is based more around the provider of the information rather than the receiver. It could be argued that a researcher is less likely to want to use the archives because they are held at a particular location, apart from for reasons of convenience, and more likely to want archives around their subject area, and it is likely that the archives which are relevant to them will be held in a whole range of archives, museums and libraries (and elsewhere). By only looking at the archives held at a particular location, even if that location is a specialist repository that represents the researcher’s key subject area, the researcher may not think about what they might be missing.

Project-based websites may group together archives in ways that  benefit researchers more obviously, because they are often aggregating around a specific subject area. For example, making available the descriptions and links to digital archives around a research topic. Value may be added through rich metadata, community engagement and functionality aimed at a particular audience. Sometimes the downside here is the sustainability angle: projects necessarily have a limited life-span, and archives do not. They are ever-changing and growing and descriptions need to be updated all the time.

So, what is the answer? Is this too much of a silo-type approach, creating a large number of websites, each dedicated to a small selection of archives?

Broader aggregation seems like one obvious answer. It allows for descriptions of archives (or other resources) to be brought together so that researchers have the benefit of searching across collections, bringing together archives by subject, place, person or event, regardless of where they are held (although there is going to be some kind of limit here, even if it is at the national level).

You might say that the Archives Hub is likely to be in favour of aggregation! But it’s definitely not all pros and no cons. Aggregations may offer a powerful search functionality for intellectually bringing together archives based on a researcher’s interests, but in some ways there is a greater risk around what is omitted. When searching a website that represents one repository, a researcher is more likely to understand that other archives may exist that are relevant to them. Aggregations tend to promote themselves as comprehensive – if not explicitly then implicitly – which this creates expectation that cannot ever fully be met. They can also raise issues around measuring impact and around licensing. There is also the risk of a proliferation of aggregation services, further confusing the resource discovery landscape.

Is the ideal of broad inter-disciplinary cross-searching going to be impeded if we compete to create different aggregations? Yes, maybe it will be to some extent, but I think that it is an inevitability, and it is valid for different gateways to service different audiences’ needs. It is important to acknowledge that researchers in different disciplines and at different levels have their own needs, their own specific requirements, and we cannot fulfill all of these needs by only presenting data in one  way.

One thing I think is critical here is for all archive repositories to think about the benefits of employing recognised and widely-used standards, so that they can effectively interoperate and so that the data remains relevant and sustainable over time. This is the key to ensuring that data is agile, and can meet different needs by being used in different systems and contexts.

I do wonder if maybe there is a point at which aggregations become unwieldy, politically complicated and technically challenging. That point seems to be when they start to search across countries. I am still unsure about whether Europeana can overcome this kind of problem, although I can see why many people are so keen on making it work. But at present, it is extremely patchy, and , for example, getting no results for texts held in Britain relating to Shakespeare is not really a good result. But then, maybe the point is that Europeana is there for those that want to use it, and it is doing ground-breaking work in its focus on European culture; the Archives Hub exists for those interested in UK Archives and a more cross-disciplinary approach; Genesis exists for those interested in womens studies; for those interested in the Co-operative movement, there is the National Co-operative Archive site; for those researching film, the British Film Institute website and archive is of enormous value.

So, is the important principle here that diversity is good because people are diverse and have diverse needs? Probably so. But at the same time, we need to remember that to get this landscape, we need to encourage data sharing and  avoid duplication of effort. Once you have created descriptions of your archive collections you should be able to put them onto your own website, contribute them to a project website, and provide them to an aggregator.

Ideally, we would be looking at one single store of descriptions, because as soon as you contribute to different systems, if they also store the data, you have version control issues. The ability to remotely search different data sources would seem to be the right solution here. However, there are substantial challenges. The Archives Hub has been designed to work in a distributed way, so that institutions can host their own data. The distributed searching does present challenges, but it certainly works pretty well. The problem is that running a server, operating system and software can actually be a challenge for institutions that do not have the requisite IT skills dedicated to the archives department.  Institutions that hold their own data have it in a great variety of formats. So, what we really need is the ability for the Archives Hub to seamlessly search CALM, AdLib, MODES, ICA AtoM, Access, Excel, Word, etc. and bring back meaningful results. Hmmm….

The business case for opening up data seems clear. Project like Open Bibliographic Data have helped progress the thinking in this arena and raised issues and solutions around barriers such as licensing.   But it seems clear that we need to understand more about the benefits of aggregation, and the different approaches to aggregation, and we need to get more buy-in for this kind of approach.  Does aggregation allow users to do things that they could not do otherwise? Does it save them time? Does it promote innovation? Does it skew the landscape? Does it create problems for institutions because of the problems with branding and measuring impact?  Furthermore, how can we actually measure these kinds of potential benefits and issues?

Websites that offer access to archives (or descriptions of archives) based on where they are located and based on they body that administers them have an important role to play. But it seems to me that it is vital that these archives are also represented on a more national, and even international stage. We need to bring our collections to where the users are. We need to ensure that Google and other search engines find our descriptions. We need to put archives at the heart of research, alongside other resources.

I remember once talking about the Archives Hub to an archivist who ran a specialist repository. She said that she didn’t think it was worth contributing to the Hub because they already had their own catalogue. That is, researchers could find what they wanted via the institute’s own catalogue on their own system, available in their reading room. She didn’t seem to be aware that this could only happen if they knew that the archive was there, and that this view rested on the idea that researchers would be happy to repeat that kind of search on a number of other systems. Archives are often about a whole wealth of different subjects – we all know how often there are unexpected and exciting finds. A specialist repository for any one discipline will have archives that reach way beyond that discipline into all sorts of fascinating areas.

It seems undeniable that data is going to become more open and that we should promote flexible access through a number of discovery routes, but this throws up challenges around version control, measuring impact, brand and identity. We always have to be cognisant of funding, and widely disseminated data does not always help us with a funding case because we lose control of the statistics around use and any kind of correlation between visits to our website and bums on seats. Maybe one of the challenges is therefore around persuading top-level managers and funders to look at this whole area with a new perspective?