Thoughts on the Heritage PIDs Project

I attended the final Zoom session for the Heritage Persistent Identifiers Project this week.

PID or Persistent Identifiers can be incredibly useful within the heritage sector. The PID project was looking at the use of PIDs across collections. They were aiming to increase uptake of PIDs, so that they service as a foundation infrastructure for drawing collections together.

The project ran two surveys with responses mainly from the UK but a number from other countries. 66 and 47 responses were received for the 1st and 2nd surveys respectively. Both surveys showed that most institutions have pockets of awareness of PIDs, although the number of people with no awareness decreased slightly over time.

The main barriers according to the surveys are lack of resources and technical issues. It is also clear that decision makers need to be more appreciative the benefits of PIDs.

The project case studies were found to be particularly useful by survey respondents, and also the PID demonstrator that showed how collections can be linked through PIDs. The case studies included the National Gallery – interestingly they are using the CIIM, as we are, so their PIDs were created as a component of the CIIM.

One thing that struck me as I was listening is that PIDs apply to all sorts of things – documents, objects, collections, publications, people, organisations, places. I think that this can make it difficult to grasp the context when people are talking about PIDs in general. I found myself getting a bit lost in the conversation because it is such a large landscape, and I am someone who has a reasonable knowledge of this area.

Within the Archives Hub we have persistent identification of descriptions, at all levels – so each unit of description has a PID. e.g. https://archiveshub.jisc.ac.uk/data/gb275-davies uses the country code GB, the repository code 275 and the reference ‘davies’. These are URIs, which gives more utility, as they can be referenced on the Web as well as in publications. We had very very long discussions about the make-up of these identifiers. We did consider having completely opaque identifiers, but we felt there was some advantage of having user-friendly URIs, especially for things like analytics – if you see that ‘gb275-davies’ has had 53 views then you may know what that means, whereas if ‘27530981’ has had 53 views, you have to go and dereference it to find out what that actually is. However, references can change over time, so if you use them in persistent identifiers you have a problem when the reference changes.

Granularity is a question that needs to be addressed when thinking about PIDs for archives. Should every item have a DOI for example (digital object identifier)?. Should the DOI be assigned to the collection? Not all collections are described to item level, so in many cases this might be a moot point. So far I don’t think we’ve received archive descriptions that include DOIs so I don’t think it is going to be top of the agenda for archives any time soon. It may not be something that we, as an aggregator, necessarily get involved with anyway. If a contributor to the Hub includes a DOI, then we can display that, and maybe that is our work done. I’m not sure that it has a role in linking aggregated data to other datasets.

ARKs were mentioned in the session. We haven’t yet considered using these within our system. We’ve only had 2 contributors out of 350 who have included them, so we are not sure that it is worth us working with them at this stage. This is one of the problems with adopting PIDs – uptake and scale. ORCIDs were also referenced. An ORCID is for researchers – eventually their papers may come to the archive, so ORCID IDs may become more relevant in time. It is important for ORCID to work with Wikidata and other PIDs to enable linking. Bionomia was mentioned as a project that already works with ORCID and Wikidata.

Overall my impression listening to the presentations was of a very mixed landscape, and that is something that makes it harder to figure out how to start working with PIDs – there is no one clear way forward. In the case studies presented there was quite a bit of emphasis on internal use cases, and that can limit the external benefits, but there was also a range of approaches. This doesn’t help anyone starting out and hoping for a clear way forward.

The Archives Hub has done work on identifying personal and organisational names and we are going to be blogging more about the outcome of that when work we implement changes to our user interface over the next few months. But it is worth saying that if you want to implement PIDs for names, you have to look at the names you have and how identifiable they really are. It has been extremely difficult for us to do this work, and we cannot possibly achieve 100% identification because of the very variable state of the names that we have in the data.

PIDs need to know what they are identifying, and being clear about what that is may in itself be a big challenge. If you assign a PID to a person, an organisation, or any entity, you want to be confident that it is right. ORCIDs are for current researchers, and if you set yourself up with an ORCID, you are going to know that it identifies you (one would hope). But if we have seven ‘Elizabeth Roberts‘ referred to on the Archives Hub, referenced in a range of archives, we may find it very difficult to know if they are the same person. Assigning identification to historical records is a massive detective challenge.

We have been looking to match our names to VIAF or Wikidata, so that we can benefit from these widely used PIDs. But to do that we need to find a way to create matches and set levels of confidence for matches. Increasingly, I am wondering if Wikidata is more promising than VIAF due to the ability to add to the database. For archives, where many names are not published individuals, this might prove to be a good way forward.

The PID project came up with a number of recommendations. Many of these were about generally promoting PIDs and integrating them into workflows. Quite a few of the recommendations look like they need significant funding. One that I think is very pertinent is working with system suppliers. It needs to be straightforward to integrate PIDs when a collection is being catalogued.

The recommendations tended to just refer to PIDs and not specific PIDs and I’m not sure whether this is helpful as it is such a broad context. Maybe it is more useful to be more specific about whether you are looking at PIDs for collections/artefacts or for researchers, for all names or for topics. For example, if you recommend looking at cost analysis, is this for any and all PIDs that might be implemented across all of the cultural heritage sector? The project has found that it is not possible to be prescriptive and narrow things down, but I still feel that talking about certain kinds of identifiers rather than PIDs in general might help to give more context to the conversation.

There are many persistent identifier systems. If we all use different identifiers then we aren’t really getting towards the kind of interconnectivity that we are after. We could do with adopting a common approach – even just a common approach within the archives domain would be useful – but that requires resource and that requires funding. Having said that, it is not essential to use exactly the same PIDs. For example, if one organisation adopts VIAF IDs for their names and another adopts Wikidata Q codes, then that is not really a problem in that VIAF and Wikidata link to each other. But adopting a system that is not widely used (and not linked up to other systems) is not really going to be very helpful.

In the end, we need a very clear sense of the benefits that PIDs will bring us. As an aggregator it is very difficult to add PIDs to data that we receive. Archives should ideally add PIDs as they create descriptions. If VIAF IDs or Wikidata Q codes, or Geonames identifiers for place names, were added during cataloguing, that could potentially be of great benefit. But this raises a big issue – we need archival management systems to make it really easy to add PIDs, and at present many of them don’t do this. Our own cataloguing tool does provide a look-up and this has proved to be really successful. It makes adding identifiers easier than not adding them – and that is what you want to achieve.

Launch of Towards a National Collection discovery projects

£14.5m awarded to transform online exploration of UK’s culture and heritage collections through harnessing innovative AI

The Arts and Humanities Research Council (AHRC) has awarded £14.5m to the research and development of emerging technologies, including machine learning and citizen-led archiving, in order to connect the UK’s cultural artefacts and historical archives in new and transformative ways.

Image by Colin McDowall, courtesy of Towards a National Collection. (Young woman winding bobbins on wheel in the loom shop, 1898 Blanket factory, Witney, Oxfordshire © Historic England Archive CC73_00946 | Indian laundry couple with the man ironing clothes. Attributed to a painter from Tanjore (Thanjavur), ca. 1840. Gouache drawing. 32247i © Wellcome Collection | Sir Hans Sloane (1660–1753) Stephen Slaughter (1697–1765) (attributed to) © The Trustees of the Natural History Museum, London | A starboard bow view of the three-masted barque Glenbervie (1866) with crowds of people, on the rocks at Lowland Point. G14146. © National Maritime Museum, Greenwich, London, Gibson’s of Scilly Shipwreck Collection | Artwork by Peter Morphew illustrating the repositories of the University of Glasgow Archives and Special Collections.)

The Archives Hub is pleased to announce that we will be a project partner in one of five major projects being launched today. The projects form the largest investment of Towards a National Collection, a five-year research programme. Today’s launch reveals the first insights into how thousands of disparate collections could be explored by public audiences and academic researchers in the future.

The five ‘Discovery Projects’ will harness the potential of new technology to dissolve barriers between collections – opening up public access and facilitating research across a range of sources and stories held in different physical locations. One of the central aims is to empower and diversify audiences by involving them in the research and creating new ways for them to access and interact with collections. In addition to innovative online access, the projects will generate artist commissions, community fellowships, computer simulations, and travelling exhibitions. The projects are:

● The Congruence Engine: Digital Tools for New Collections-Based Industrial Histories

● Our Heritage, Our Stories: Linking and searching community-generated digital content to develop the people’s national collection

● Transforming Collections: Reimagining Art, Nation and Heritage

● The Sloane Lab: Looking back to build future shared collections

● Unpath’d Waters: Marine and Maritime Collections in the UK

The investigation is the largest of its kind to be undertaken to date, anywhere in the world. It extends across the UK, involving 15 universities and 63 heritage collections and institutions of different scales, with over 120 individual researchers and collaborators.

Together, the Discovery Projects represent a vital step in the UK’s ambition to maintain leadership in cross-disciplinary research, both between different humanities disciplines and between the humanities and other fields. Towards a National Collection will set a global standard for other countries building their own collections, enhancing collaboration between the UK’s renowned heritage and national collections worldwide.

Archives Hub and the Transforming Collections: Reimagining Art, Nation and Heritage project

Donald Locke 1972-4, Trophies of Empire © Estate of Donald Locke Courtesy of Tate | Claudette Johnson, Figure in Blue, 2018. © Claudette Johnson. Image Credit: Arts Council Collection, Southbank Centre | Iniva_Rivington Place: Photograph by Carlos Jimenez, 2018 | Rachel Jones, lick your teeth, they so clutch, 2021. Arts Council Collection, Southbank Centre, London © the artist.
Donald Locke 1972-4, Trophies of Empire © Estate of Donald Locke Courtesy of Tate | Claudette Johnson, Figure in Blue, 2018. © Claudette Johnson. Image Credit: Arts Council Collection, Southbank Centre | Iniva_Rivington Place: Photograph by Carlos Jimenez, 2018 | Rachel Jones, lick your teeth, they so clutch, 2021. Arts Council Collection, Southbank Centre, London © the artist. Image courtesy of the artist and Thaddaeus Ropac, London.

The Archives Hub at Jisc will be working with fellow project partners:

susan pui san lok, 2021
susan pui san lok, 2021: Courtesy the artist
  • Tate
  • Arts Council Collection
  • Art Fund
  • Art UK
  • Birmingham Museums Trust
  • British Council Collection
  • Contemporary Art Society
  • Glasgow Museums
  • Iniva (Institute of International Visual Art)
  • Manchester Art Gallery
  • Middlesbrough Institute of Modern Art
  • National Museums Liverpool
  • Van Abbemuseum (NL)
  • Wellcome Collection

The Principal investigator for Transforming Collections: Reimagining Art, Nation and Heritage project is Professor susan pui san lok, University of the Arts London.

More than twenty years after Stuart Hall posed the question, ‘Whose heritage?’, Hall’s call for the critical transformation and reimagining of heritage and nation remains as urgent as ever. This project is driven by the provocation that a national collection cannot be imagined without addressing structural inequalities in the arts, engaging debates around contested heritage, and revealing contentious histories imbued in objects.

An arrangement of different castes including snake charmer, brick-layer, basket-maker, potter and wives. Gouache drawing. 28438i © Wellcome Collection.

Transforming Collections aims to enable cross-search of collections, surface patterns of bias, uncover hidden connections, and open up new interpretative frames and ‘potential histories’ (Azoulay, 2019) of art, nation and heritage. It will combine critical art historical and museological research with participatory machine learning design, and embed creative activations of interactive machine learning in the form of artist commissions.

Untitled 1986 1987.21, Manchester Art Gallery © Keith Piper.

Among the aims of this project are to surface suppressed histories, amplify marginalized voices, and re-evaluate artists and artworks ignored or side-lined by dominant narratives; and to begin to imagine a distributed yet connected evolving ‘national collection’ that builds on and enriches existing knowledge, with multiple and multivocal narratives.

The role of the Archives Hub will centre around:

  • Disseminating project aims, developments and outcomes to our contributors, through our communication channels and our cataloguing workshops, to encourage a wide range of archives to engage with these issues.
Glasgow Women’s Library, Museum of the Year finalist, 2018. Art Map 2019. © Marc Atkins / Art Fund 2018
  • Working with the Creative Computing Institute, at the University of the Arts London, to integrate the Machine Learning (ML) processing into the Archives Hub data processing workflows, so that it can benefit for over 350 institutions, including public art institutions.
Mick Grierson, Exploring the Daphne Oram Collection using 3D visualisation and machine learning (screenshot). 2012. Mick Grierson, Parag MitalLondon © the artist.
  • Providing expertise from over 20 years of running an archival aggregator and working with a whole range of UK archive repositories, particularly around sustainability and the challenges of working with archival metadata.