Is Linked Data an Appropriate Technology for Implementing an Archive’s Catalogue?

Here at the Archives Hub we’ve not been so focussed on Linked Data (LD) in recent years, as we’ve mainly been working on developing and embedding our new system and workflows. However, we have continued to remain interested in what’s going on and are still looking at making Linked Data available in a sustainable way. We did do a substantial amount of work a number of years back on the LOCAH project from which we provided a subset of archival linked data at data.archiveshub.ac.uk.  Our next step this time round is likely to be embedding schema.org markup within the Hub descriptions. We’ve been closely involved in the W3C Schema Architypes Group activities, with Archives Hub URIs forming the basis of the group’s proposals to extend the “Schema.org schema for the improved representation of digital and physical archives and their contents”.

We are also aiming to reconnect more closely with the LODLAM community generally, and to this end I attended a TNA ‘Big Ideas’ session ‘Is Linked Data an appropriate technology for implementing an archive’s catalogue?’ given by Jean-Luc Cochard of the Swiss Federal Archives. I took a few notes which I thought it might be useful to share here.

Why we looked at Linked Data?

This was initially inspired by the Stanford LD 2011 workshop and the 2014 Open data.swiss initiative. In 2014 they built their first ‘aLOD’ prototype – http://alod.ch/

The Swiss have many archive silos from which they transformed the content of some systems to LD and then were able to merge. They created basic LD views, Jean-Luc noting that the LD data is less structured than data in the main archival systems, an example of which is e.g. http://data.ge.alod.ch/id/archivalresource/adl-j-125

They also developed a new interface http://alod.ch/search/ with which they were trying for an innovative approach to presenting the data such as providing a histogram with dates.  It’s currently just a prototype interface running off SPARQL with only 16,000 entries so far.

They are also now currently implementing a new archival information system (AIS) and are considering LD technolgy for the new system, but may go with a more conventional database approach. The new system has to work with the overall technical architecture.

Linked data maturity?

Jean-Luc noted that they expect that in three years born digital will greatly expand by factor of ten, though 90% of the archive is currently analogue. The system needs to cope with 50M – 1.5B triples. They have implemented Stardog triple stores 5.0.5 and 5.2. The larger configuration is a 1 TB RAM, 56 CPU and 8 TB disk machine.

As part of performance testing they have tried loading the system with up to 10 Billion triples and running various insert, delete and query functions. The larger config machine allowed 50M triple inserts in 5 min. 100M plus triples took 20min to insert. With the update function things were found to be quite stable.  They then combined querying with triple insertions at the same time, and this highlighted some issues with slow insertions with a smaller machine. They also tried full text indexing with the larger config machine. They got very variable results with some very slow response times with the insertions, finding the latter was a bug in the system.

Is Linked Data adequate for the task?

A key weakness of their current archival system is that you can only assign records to one provenance/person. Also, their current system can’t connect records to other databases, so they have the usual silo problem. Linked data can solve some of these problems. As part of the project they looked at various specs and standards:

BIBFRAME v2.0 2016
Europeana EDM released 2014.
EGAD activities – RiC-CM -> RiC-O based on OWL (Record in context)
A local initiative- Matterhorn RDF Model.  Matterhorn uses existing technologies, RDA, BPMN, DC, PREMIS. There is a first draft available.

They also looked at relevant EU R&D projects: ‘Prelia’, on preservation of LD and ‘Diachron’ – managing evolution and preservation of LD.

Jean-Luc noted that the versatility of LD is appealing for several reasons –

  • It can be used at both the data and metadata levels.
  • It brings together multiple data models.
  • It allows data model evolution.
  • They believe it is adequate to publish archive catalogue on the web.
  • It can be used in closed environment.

Jean-Luc  mentioned a dilemma they have between RDF based Triple stores and graph databases. Graph databases tend to be proprietary solutions, but have some advantages. Graph databases tend to use ACID transactions intended to guarantee validity even in the event of errors, power failures, etc., but they are not sure how ACID reliable triple stores are.

Their next step is expert discussion of a common approach, with a common RDF model. Further investigation is needed regarding triple store weaknesses.

Exploring British Design at the Europeana AGM 2015

I’m just back from another enjoyable and useful Europeana Network Association event where I gave a four minute ‘Ignite Talk’ on our recently completed ‘Exploring British Design’ project that Pete and Jane worked on. As it was such a short talk, I wanted make sure I got the timing right, so actually wrote the talk out. I think it gives quite a good summary of the project, as well as mentioning our connection with Europeana, so I thought it would be worth posting it here along with a link to the slides:

“Hello, my name is Adrian Stevenson and I’m a Senior Technical Coordinator working for Jisc in the UK.

[Introduction slide]

Today I want to briefly outline a one year project we’ve recently completed called ‘Exploring British Design’ which was funded by the Arts and Humanities Research Council.

The technical work and front-end interface for Exploring British Design was developed by the Archives Hub based in the UK. The Hub aggregates archival descriptions from about 280 institutions in the UK, from the very large such as the British Library to the very small such as the Shakespeare’s Globe Theatre, making these archives available to be searched through our website, APIs and findable on Google. For some institutions, the Archives Hub provides their only web presence, so it’s an important service for the archives sector in the UK.

For ‘Exploring British Design’ we collaborated with one of our enthusiastic contributors, the Brighton Design Archive, based at the University of Brighton. We used the ‘Britain Can Make It’ exhibition from 1946 as a focal point because the Archive has rich collections relating to this exhibition.

So what’s the connection with Europeana? The Archives Hub is in the process of contributing data to the Archives Portal Europe. The plan is that the portal data will be available through Europeana at some point in the future.

[Home page slide]

So lets have a look. This is the home page of the website. You can see that we take people, i.e. the designers and architects, their organisations, and the events they were involved with, such as the exhibition as the starting points, i.e. not the archive records as such.

What’s unique about this project is that we’re going beyond the record as being about about one person, one organisation and having one focus. The reality is that archives are about the connections between all sorts of people, places, and events, such as exhibitions, and much of this information is effectively ‘locked in’ the archival records. This is what we’re trying to draw out.

The idea is that anything can be a primary focus:  people, organisations, places, events or archive collections. Some of you may recognise this as an idea relating to linked data, and indeed this is loosely the approach we took for the under the hood implementation. We also looked at an archival name authority standard called EAC-CPF to help with this.

[Designer slide]

You see here how we’ve tried to emphasise the relationship types, such as ‘friend of’, ‘collaborates with, ‘colleague of’ and so on. Researchers are most interested in people, events, etc. not in archives per se.

[Exhibition slide]

This is a view of the exhibition page, focussing in on it as an event in its own right with a location, related people, etc. This sort of information hasn’t historically been captured all that usefully in archival descriptions.

[Visualisation slide]

We included visualisations, but these actually fall far short of the complexity of the relationships. It’s quite hard to get these to work effectively, but they give a sense of the relationships between architect Jane Drew and Le Corbusier, or even Croydon High School for Girls.

So hopefully you can get a sense of how we’ve tried to present researchers with more flexible routes through the connections we created, helping to surface relationships between people, organisations and events that were effectively hidden in the more traditional document-based way of presenting information.”

There was an excellent reception in the evening at the Rijksmuseum where we were lucky enough to get a private view of the ‘Gallery of Honour’. It was a great opportunity to get a picture by Rembrandt’s ‘Night Watch’ so we made the most. Thanks again to Europeana!

In front of the 'Night Watch
Adrian Stevenson and others in front of Rembrandt’s ‘Night Watch’ at the Rijksmuseum, Amsterdam.