Linking Cultural Heritage Data

Last week I attended a meeting at the British Museum to talk with some museum folk about ways forward with Linked Data. It was a follow up to a meeting I organised on Archives and Linked Data,  and it was held under the auspices of the CIDOC Documentation Standards Working Group. The group consisted of me, Richard Light (Museum consultant), Rory McIllroy (ULCC), Jeremy Ottevanger (Imperial War Museum), Jonathan Whitson Cloud (British Museum), Julia Stribblehill (British Museum), and briefly able to join us was Pete Johnston (worked on the Locah project and now on the Linking Lives Linked Data project).

It proved to be a very pleasant day, with lots of really useful discussion. We spent some time simply talking about the differences – and similarities – in our perspectives and in our data. One of our aims was to start to create more links and dialogue between our sectors, and in this regard I think that the day was undoubtedly successful.

To start with our conversation ranged around various issues that our domains deal with. For example, we talked a bit about definitions, and how important they are within a museum context. For example, if you think about a collection of coins, defining types is key, and agreeing what those types are and what they should be called could be a very significant job in itself.  We were thinking about this in the context of providing authoritative identifiers for these types, so that different data sources can use the same terms.  Effectively identifying entities such as names and places are vital for museums, libraries and archives, of course, and then within the archive community we could also provide authoritative identifiers for things like levels of description. Workign together to provide authoritative and persistent URIs for these kinds of things could be really useful for our communities.

We talked about the value of promoting ‘storytelling’ and the limitations that may inhibit a more event-based approach. DBPedia (Wikipedia as Linked Data) may be at the centre of the Linked Data Cloud, but it may not be so useful in this context because it cannot chart data over time. For example, it can give you the population of Berlin, but it cannot give you the changing population over time. We agreed that it is important to have an emphasis on this kind of timeline approach.

We spent a little while looking at the British Museum’s departmental database, which includes some archives, but treats them more as objects (although the series they form a part of is provided, this contextual information is not at the fore – there is not a series description as such). The proposal is to find a way to join this system up with the central archive, maybe through the use of Linked Data.

We touched upon the whole issue of what a ‘collection’ is within the museum context, which is often more about single objects, and reflected on the challenge of how to define a collection, because even something like a cup and saucer could be seen as a collection…or it is one object?…or is a full tea set a collection?

For archivists, quite detailed biographical information is often part of the description of a collection. We do this in order to place the collection within a context. These biographical histories often add significant value to our descriptions, and sometimes the information in them may be taken from the archive collection, so new information may be revealed. Museums don’t tend to provide this kind of detail, and are more likely to reference other sources for the researcher to use to find out about individuals or organisations. In fact, referencing external sources is something archives are doing more frequently, and Linked Data will encourage this kind of approach, and may save us time in duplicating effort creating new biographical entries for the same person. (There is also the move towards creating separate name authorities, but this also brings with it big challenges around sharing data and using the same authorities).

We moved on to talk about Linked Data more specifically, and thought a bit about whether the emphasis should be on discovery or the quality and utility of what you get when you are presented with the results. We generally felt that discovery was key because Linked Data is primarily about linking things together in order to make new discoveries and take new directions in research.

book showing design patterns
Wellcome Library, London

One of the main aims of the day was to discuss the idea of the use of design patterns to help the cultural heritage community create and use Linked Data. It would facilitate the process of querying different graphs and getting reasonably predictable information back, if we could do things in common where possible. Richard has written up some thoughts about design patterns from a museum perspective and there is a very useful Linked Data Patterns book by Leigh Dodds and Richard Davis. We felt this could form a template for the sort of thing that we want to do. We were well aware that this work would really benefit from a cross-domain approach involving museums, archives, libraries and galleries, and this is what we hope to achieve.

We spoke briefly about the value of something like OpenCalais and wondered whether a cultural heritage version of this kind of extraction tool would be useful. If it was more tailored for our own sectors, it may be more useful in creating authorities, so that we can refer to things in a common way, as a persistent URL would be provided for the people, subjects, concepts, that we need to describe. We considered the scenario that people may go back to writing free text and then intelligent tools will extract concepts for them.

We concluded that it would be worth setting up a Wiki to encourage the community to get involved in exploring the idea of Linked Data Patterns. We thought it would be a good idea to ask people to tell us what they want to know – we need the real life questions and then we can think about how our data can join up to answer those questions. Just a short set of typical real-life questions would enable us to look at ways to link up data that fit a need, because a key question is whether existing practices are a good fit for what researchers really want to know.

Arrive in Wonder, Leave in Wisdom!

Roll Up Roll Up for Open Cuture!

image of open culture banner

I arrived at the Open Culture conference just in time to grab a cup of tea and dash along to hear Malcolm Howitt’s talk on Axiell. He focussed on Axiell Arena,
software, a new content management option. It provides for a more interactive experience, complete with tag cloud and the ability to add comments.  It looked pretty good, very much in line with where things are going in terms of these kinds of websites. However, from our point of view as an aggregator what we are keen to see is an API to the data to enable others to engage with it more flexibly, something that has yet to happen on CALM. Maybe this raises the whole issue of the challenge of open data to commercial suppliers – it does rather appear to threaten their business model, and I can see that this would be of concern to them.

The second presentation I saw was from Deep Visuals on ViziQuest, ‘a new way to explore digital collections’. They used natural language processing to extract the concepts from the text.  So the system uses existing metadata in order to enable semantic browsing.  The idea is to provide a different kind of search experience, where the user can meander through a collection of images. You can flip over image to find metadata about the image, which is quite neat.

Deep Visuals have worked with the Scott Poloar Research Institute, one of the Hub contributors, and there are some wonderful images of expeditions. For some images, the archivist has recorded an audio and there are also some film clips  – I saw a great clip on board a ship bound for the arctic.  Currently the software is only available for users within the institute, but it may be made available through the website. You can see a small demo here: http://www.deepvisuals.com/Demo/.  In addition, ViziQuest have taken some expedition diaries and recorded some audio with actors.

The morning was rounded off with a talk about Culture Grid. The importance of Culture Grid being part of national and international initiatives was emphasised, and there was reference to RDTF (now UKDiscovery) and the whole HE agenda, which was good to hear.

Currently Culture Grid contains about 1.65 million item records, mostly referring to images. There are also about 10,000 collection records and 8,000 institution records. We were told that ‘Cuture Grid site and search is not a destination in itself.’  This slightly surprised me, as I did think that this was one of its purposes, albeit only one and maybe not the primary one.

I was impressed by the way Culture Grid is positioning itself as a means to facilitate the use of data by others. Culture Grid has APIs and we were told that a growing range of users do take advantage of this. They are also getting very involved in developer days as a means to encourage innovation. I think this is something archives should engage with, otherwise we will get left behind in the innovative exploration of how to make the most of our data.

Whilst I am very much in agreement with the aims of opening up data, I am not entirely convinced by the Culture Grid website. It does appear to prioritise digital materials – it works much better where there are images. The links back to resources often don’t work. I did a search for ‘victorian theatre’ and first of all the default search was ‘images only’, excluding ‘collections’ and non-images based materials. Then, two of the first four links to resources I clicked on got an internal server error.  I found at least six links that didn’t work on the first two pages of results. Obviously this is not Culture Grid’s fault, but it is certainly a problem. I also wonder about how intuitive it is, with resource links going to so many different types of websites, and at so many different levels of granularity. Quite often you don’t go straight to the resource: one of the links I clicked on from an item went to the Coventry Council homepage, another went to the ‘how do I?’ page of the University of Hull. I asked about the broken links and didn’t feel that the reply was entirely convincing – I think it should be addressed more comprehensively.  I think if the Hub was to contribute descriptions to Culture Grid one of my main concerns would be around updating descriptions. I’m also not sure about the need to create additional metadata. I can’t quite get the reasoning behind the Culture Grid metadata, and the way that the link on the title goes to the ‘resource’ (the website of the contributor), but the ‘view details’ link goes to the Culture Grid metadata, which generally provides a cut down version of the description.

The afternoon was dedicated to Spectrum, something I know only a little about other than that it is widely used as a framework by museums in their collections care. Spectrum is, we were told, used in about 7,000 institutions across Europe. Nick Poole, the CEO of the Collections Trust, emphasised that Spectrum should be a collaborative venture, so everyone needs to engage in it.  Yet maybe it has become so embedded that people don’t think about it enough.  The new Spectrum 4 is seen as providing an opportunity to re-engage the community.

There was an interesting take on Spectrum by the first speaker as a means to actually put people off starting museums…but he was making the important point that a standard can show people what is involved – and that it is a non-trivial task to look after museum collections. I got the impression that Spectrum has been a way to get curators on board with the idea of standards and pulling together to work more professionally and consistently.

Alex Dawson spoke about the latest edition of Spectrum in her capacity as one of the co-editors. Spectrum is a consensus about collections management procedures, about consistency, accountability and a common vocabulary. It is not supposed to be prescriptive; it is the ‘what’ more than the ‘how’.  It has 21 procedures describing collections management activities, of which 8 are considered primary. We were told that the link to accreditation was very important in the history of spectrum, and other milestones have included the introduction of rights management procedures, establishing a clear link between procedures and policy and greater recognition of the importance of the knowledge held within museums (through Spectrum Knowledge).

There has been an acknowledgement that Spectrum started to become more cumbersome and information could get buried within this very large entity, it was also starting to get out of date in certain areas. I can see how Spectrum 4.0 is an improvement on this because it contains clear flow diagrams that bring out the processes much more obviously and shows related procedures. It also separates out the procedural and information requirements.  The advisory content has been stripped out (and put into online Spectrum Advice) in order to concentrate on procedural steps through flow diagrams.

The consultation on Spectrum 4 was opened up via a wiki: http://standards.collectionslink.org.uk/index.php/Collections_Link_Standards_wiki

The main day of the conference included some really great talks. Bill Thompson from the BBC was one highlight.  He talked about ‘A Killer App for Culture’, starting with musings on the meaning of ‘culture’. He talked about digital minds in this generation, which may change the answers that we come up with and may change the meaning of words. Shifting word sense can present us with challenges when we are in the business of data and information. He made the point convincingly that the world is NOT digital, as we often state; it is reassuringly still organic. But digital DATA is everywhere. It is an age in which we experience a digital culture, and maybe the ways that we do this are actually having an effect on the way that we think. Bill cited the book ‘Proust and the Squid’ by Maryanne Wolf which I would also throroughly recommend. Wolf looks at the way that learning to read impacts on the ways that we think.

Matthew Cock from the British Museum and Andrew Caspari from the BBC presented on A History of the World in 100 Objects.  We were told how this initiative gradually increased in scale to become enjoyed by millions of people across the world. It was a very collaborative venture between the BBC and British Museum. There were over 2.5 million visits to the site, often around 40,000 in a week when the programme was not on air.  It was interesting to hear that the mobile presence was seen as secondary at the time, but probably should have been prioritised more. ‘Permanent availability portable and for free’ was absolutely key said Andrew Caspari.

It was an initiative that really brought museums together – maybe not surprising with such a high profile initiative.  The project was about sharing and a different kind of partnership defined by mutual benefit, and most importantly, it was about closing the gap between public engagement and collection research. It obviously really touched people’s imaginations and they felt a sense of being part of something.  It does seem like a very successful combination of good fun, entertainment and learning. However,  we were told that there were issues. Maybe the digital capacity of museums was overestimated and longer lead in times were required than the BBC provided. Also, the upload to the site needed to be simpler.

Cock and Caspari referred to the way the idea spread, with things like ‘A history of the world in 100 sheds’. Should you be worried that this might trivialize the process, or should you be pleased that it caught on, stirred imaginations and controversy and debate?

David Fleming of National Museums Liverpool followed with an equally absorbing talk about museums and human rights. He said museums should be more aware that they are constructs of the society they are in. They should mirror society. They should give up on the idea of being neutral and engage in issues.  He is involved in the International Slavery Museum in Liverpool, and this is a campaigning museum. Should others follow suit? It makes museums an active part of society – both historical and contemporary. Fleming felt that a visit to the museum should stir people and make them want to get involved.

He gave a number of examples of museums where human rights are at the heart of the matter, including:

District Six in South Africa: http://www.districtsix.co.za – very much a campaigning museum that does not talk about collections so much as stories and lives, using emotion to engage people.

The  Tuol Sleng Museum of Genocide Victims in Cambodia, a building that was once Pol Pot’s secret prison. The photographs on this site are hugely affecting and harrowing. Just seemingly ordinary portrait shots of prisoners, but with an extraordinary power to them.

The Lithuanian Museum of Genocide Victims . This is a museum where visitors can get a very realistic experience of what it was like to live under the Soviet regime. Apparently this experience, using actors as Soviet guards, has led to some visitors passing out, but the older generation are passionate to ensure that their children understand what it was like at this time.

We moved on to a panel session on Hacking in Arts & Culture was of particular interest to me.  Linda Ellis from Black Country Museums gave a very positive assessment of how the experience of a hack day had been for them. She referred to the value of nurturing new relationships with developers, and took us through some of the ideas that were created.  You can read a bit more about this and about putting on a hack day on Dan Slee’s blog: https://danslee.wordpress.com/tag/black-country-museums/

What we need now is a Culture Hack day that focuses on archival data – this may be more challenging because the focus is text not images, but it could give us some great new perspectives on our data. According to Rachel Coldicutt, a digital consultant, we need beanbags, beer, pizza, good spirit and maybe a few prizes to hand out….. Doesn’t seem too hard. ….oh, and some developers of course :-)

Some final thoughts around a project at the New Walsall Art Gallery: Neil Lebeter told us that the idea was to make the voice of the artist key. In this case, Bob and Roberta Smith. The project centered around the Jacob Epstein archive and found ways to bring the archive alive through art – you can see some interesting video clips about this process on YouTube: http://www.youtube.com/user/newartgallerywalsall.

I found Open Culture was billed as a conference meeting the needs of museums, libraries and archives, but I do think it was essentially a museums conference with a nod to archives and maybe a slight nod to libraries. This is not to criticise the conference, which was very well presented, and there really were some great speakers, but maybe it points to the challenges of bringing together the three domains?  In the end, they are different domains with different needs and interests as well as areas of mutual interest. Clearly there is overlap, and there absolutely should be collaboration, but maybe there should also be an acknowledgement that we are also different communities, and we have some differing requirements and perspectives.