Digital Content on Archives Hub

As part of the Archives Hub Labs ‘Images and Machine Learning’ project we are currently exploring the challenges around implementing IIIF image services for archival collections, and also for Archives Hub more specifically as an aggregator of archival descriptions. This work is motivated by our desire to encourage the inclusion of more digital content on Archives Hub, and to improve our users’ experience of that content, in terms of both display and associated functionality.

Before we start to report on our progress with IIIF, we thought it would be useful to capture some of our current ideas and objectives with regards to the presentation of digital content on Archives Hub. This will help us to assess at later stages of the project how well IIIF supports those objectives, since it can be easy to get caught up in the excitement of experimenting with new technologies and lose sight of one’s starting point. It will also help our audience to understand how we’re aiming to develop the Hub, and how the Labs project supports those aims.

The poet Edward Thomas, ‘Wearing hat, c.1904’.

Why more digital content?

  • We know its what our users want
  • Crucial part of modern research and engagement with collections, especially after Covid
  • Another route into archives for researchers
  • Contributes to making archives more accessible
  • Will enable us to create new experiences and entry points within Archives Hub
  • To support contributing archives which can’t host or display content themselves

The Current Situation

At the moment our contributors can include digital content in their descriptions on Archives Hub. They add links to their descriptions prior to publication, and they can do this at any level, e.g. ‘item’ level for images of individually catalogued objects, or maybe ‘fonds’ or ‘collection’ level for a selection of sample images. If the links are to image files, these are displayed on the Hub as part of the description. If the links are to video or audio files, or documents, we just display a link.

There are a few disadvantages to this set up: it can be a labour-intensive process adding individual links to descriptions; links often go dead because content is moved, leading to disappointment for researchers; and it means contributing archives need to be able to host content themselves, which isn’t always possible.

From Glasgow School of Art Archives: Art, Design and Architecture Collection

Where images are included in descriptions, these are embedded in the page as part of the description itself. If there are multiple images they are arranged to best fit the size of the screen, which means their order isn’t preserved.

If a user clicks on an image it is opened in a pop out viewer, which has a zoom button, and arrows for browsing if there is more than one image.

The embedded image and the viewer are both quite small, so there is also a button to view the image in fullscreen.

The viewer and the fullscreen option both obscure all or part of the decription itself, and there is no descriptive information included around the image other than a caption, if one has been provided.

As you can see the current interface is functional, but not ideal. Listed below are some of the key things we would like to look at and improve going forwards. The list is not intended to be exhaustive, but even so it’s pretty long, and we’re aware that we might not be able to fix everything, and certainly not in one go.

Documenting our aims though is an important part of steering our innovations work, even if those aims end up evolving as part of the exploration process.

Display and Viewing Experience

❐ The viewer needs updating so that users can play audio and video files in situ on the Hub, just as they can view images at the moment. It would be great if they could also read documents (PDF, Word etc).

❐ Large or high-resolution image files should load more quickly into the viewer.

❐ The viewer should also include tools for interacting with content, e.g. for images: zoom, rotate, greyscale, adjust brightness/contrast etc; for audio-visual files: play, pause, rewind, modify speed etc.

❐ When opened, any content viewer should expand to a more usable size than the current one.

❐ Should the viewer also support the display of descriptive information around the content, so that if the archive description itself is obscured, the user still has context for what they’re looking at? Any viewer should definitely clearly display rights and licensing information alongside content.

Search and Navigation

❐ The Archives Hub search interface should offer users the option to filter by the type of digital content included in their search results (e.g. image, video, PDF etc).

❐ The search interface should also highlight the presence of digital content in search results more prominently, and maybe even include a preview?

❐ When viewing the top level of a multi-level description, users should be able to identify easily which levels include digital content.

❐ Users should also be able to jump to the digital content within a multi-level description quickly – possibly being able to browse through the digital content separately from the description itself?

❐ Users should be able to begin with digital content as a route into the material on Archives Hub, rather than only being able to search the text descriptions as their starting point.

Contributor Experience

❐ The Archives Hub could offer some form of hosting service, to support archives, improve availability of digital content on the Hub, and allow for the development of workflows around managing content.

❐ We could develop a user-friendly method for linking content to descriptions, to make including and updating digital content easy and time-efficient.

❐ Any workflows or interfaces for managing digital content should be straightforward and accessible for non-technical staff.

❐ If contributors wish to publish or curate their digital content on the Archives Hub, the service could give them access to innovative but sustainable tools, which drive engagement by highlighting their collections.

❐ If possible, any resources created should be re-usable within an archive’s own sites or resources – making the most of both the material and the time invested.

❐ We could offer options for contributors to curate content in creative and inventive ways which aren’t tied to cataloguing alone, and which offer alternative ways of experiencing archival material for users.

Future Possibilities

❐ It would be exciting for users to be able to ‘collect’, customise or interact with content in more direct ways. Some examples might include:
– Creating their own collections of content
– Creating annotations or notes
– Publicly tagging or commenting on content

❐ Develop the experience for users with things like: automated tagging of images for better search; providing searchable OCR scanned text for text within images; using the tagging or classification of content to provide links to information and resources elsewhere.

Image credits

Edward Thomas: Papers of Edward Thomas (GB 1239 424/8/1/1/10), Cardiff University Archives / Prifysgol Caerdydd.

Images and Machine Learning Project

Under our new Labs umbrella, we have started a new project, ‘Images and Machine Learning’ it has three distinct and related strands.

screenshot with bullet points to describe the DAO store, IIIF and Machine Learning
The three themes of the project

We will be working on these themes with ten participants, who already contribute to the Archives Hub, and who have expressed an interest in one or more of these strands: Cardiff University, Bangor University, Brighton Design Archives at the University of Brighton, Queens University Belfast, the University of Hull, the Borthwick Institute for Archives at the University of York, the Geological Society, the Paul Mellon Centre, Lambeth Palace (Church of England) and Lloyds Bank.

This project is not about pre-selecting participants or content that meet any kind of criteria. The point is to work with a whole variety of descriptions and images, and not in any sense to ‘cherry pick’ descriptions or images in order to make our lives easier. We want a realistic sense of what is required to implement digital storage and IIIF display, and we want to see how machine learning tools work with a range of content. Some of the participants will be able to dedicate more time to the project, others will have very little time, some will have technical experience, others won’t. A successful implementation that runs beyond our project and into service will need to fit in with our contributors needs and limitations. It is problematic to run a project that asks for unrealistic amounts of time from people that will not be achievable long-term, as trying to turn a project into a service is not likely to work.

DAO Store

Over the years we have been asked a number of times about hosting content for our contributors. Whilst there are already options available for hosting, there are issues of cost, technical support, fit for purpose-ness, trust and security for archives that are not necessarily easily met.

Jisc can potentially provide a digital object store that is relatively inexpensive, integrated with the current Archives Hub tools and interfaces, and designed specifically to meet our own contributors’ requirements. In order to explore this proposal, we are going to invest some resource into modifying our current administrative interface, the CIIM, to enable the ingest of digital content.

We spent some time looking at the feasibility of integrating an archival digital object store with the current Jisc Preservation Service. However, for various reasons this did not prove to be a practical solution. One of the main issues is the particular nature of archives as hierarchical multi-level collections. Archival metadata has its own particular requirements. The CIIM is already set up to work with EAD descriptions and by using the CIIM we have full control over the metadata so that we can design it to meet the needs of archives. It also allows us to more easily think about enabling IIIF (see below).

The idea is that contributors use the CIIM to upload content and attach metadata. They can then organise and search their content, and publish it, in order to give it web address URIs that can be added to their archival descriptions – both in the Archives Hub and elsewhere.

It should be noted that this store is not designed to be a preservation solution. As said, Jisc already provides this service, and there are many other services available. This is a store for access and use, and for providing IIIF enabled content.

The metadata fields have not yet been finalised, but we have a working proposal and some thoughts about each field.

Titlemandatory? individual vs batch?
Datespreferably structured, options for approx. and not dated.
Licencepossibly a URI. option to add institution’s rights statement.
Resource typecontrolled list. values to be determined with participants. could upload a thesaurus. could try ML to identify type.
Keywordsfree text
Taggingenable digital objects to be grouped e.g by topic or e.g. ‘to do’ to indicate work is required
Statusunpublished/published. May refer to IIIF enabled.
URLunique URI of image (at individual level)
Proposed fields for the Digital Object Store

We need to think about the workflow and user interface. The images would be uploaded and not published by default, so that they would only be available to the DAO Store user at that point. On publication, they would be available at a designated URL. Would we then give the option to re-size? Would we set a maximum size? How would this fit in with IIIF and the preference for images of a higher resolution? We will certainly need to think about how to handle low resolution images.

International Image Interoperability Framework

IIIF is a framework that enables images to be viewed in any IIIF viewer. Typically, they can be sequenced, such as for a book, and they are zoomable to a very high resolution. At the heart of IIIF is the principle that organisations expose images over the web in a way that allows researchers to use images from anywhere, using any platform that speaks IIIF. This means a researcher can group images for their own research purposes, and very easily compare them. IIIF promotes the idea of fully open digital content, and works best with high resolution images.

There are a number of demos here: https://matienzo.org/iiif-archives-demo/

And here is a demo provided by Project Mirador: http://projectmirador.org/demo/

An example from the University of Cambridge: https://cudl.lib.cam.ac.uk/view/MS-RGO-00014-00051/358

And one from the University of Manchester: https://www.digitalcollections.manchester.ac.uk/collections/ruskin/1

There are very good reasons for the Archives Hub to get involved in IIIF, but there are challenges being an aggregator that individual institutions don’t face, or at least not to the same degree. We won’t know what digital content we will receive, so we have to think about how to work with images of varying resolutions. Our contributors will have different preferences for the interface and functionality. On the plus side, we are a large and established service, with technical expertise and good relationships with our contributors. We can potentially help smaller and less well-resourced institutions into this world. In addition, we are well positioned to establish a community of use, to share experiences and challenges.

One thing that we are very convinced by: IIIF is a really effective way to surface digital content and it is an enormous boon to researchers. So, it makes total sense for us to move into this area. With this in mind, Jisc has become a member of the IIIF Consortium, and we aim to take advantage of the knowledge and experience within the community – and to contribute to it.

Machine Learning

This is a huge area, and it can feel rather daunting. It is also very complicated, and we are under no illusions that it will be a long road, probably with plenty of blind alleys. It is very exciting, but not without big challenges.

It seems as if ML is getting a bad reputation lately, with the idea that algorithms make decisions that are often unfair or unjust, or that are clearly biased. But the main issue lies with the data. ML is about machines learning from data, and if the data is inadequate, biased, or suspect in some way, then the outcomes are not likely to be good. ML offers us a big opportunity to analyse our data. It can help us surface bias and problematic cataloguing.

We want to take the descriptions and images that our participants provide and see what we can do with ML tools. Obviously we won’t do anything that affects the data without consulting with our contributors. But it is best with ML to have a large amount of data, and so this is an area where an aggregator has an advantage.

This area is truly exploratory. We are not aiming for anything other than the broad idea of improved discoverability. We will see if ML can help identify entities, such as people, places and concepts. But we are also open to looking at the results of ML and thinking about how we might benefit from them. We may conclude that ML only has limited use for us – at least, as it stands now. But it is changing all the time, and becoming more sophisticated. It is something that will only grow and become more embedded within cultural heritage.

Over the next several months we will be blogging about the project, and we would be very pleased to receive feedback and thoughts. We will also be holding some webinar sessions. These will be advertised to contributors via our contributors list, and advertised on the JiscMail archives-nra list.

The Archives Hub and IIIF: supporting the true potential of images on the Web

IIIF is a model for presenting and annotating digital content on the Web, including images and audio/visual files. There is a very active global community that develops IIIF and promotes the principles of open, shareable content. One of the strengths of IIIF is the community, which is a diverse mix of people, including developers and information professionals.

IIIF map showing where there are known IIIF projects and implementations

Images are fundamental carriers of information. They provide a huge amount of value for researchers, helping us understand history and culture. We interact with huge amounts of images, and yet we do not always get as much value out of them as we might. Content may be digitised, but it is often within silos, where the end user has to go to a specific website to discover content and to view a specific image, it is not always easy or possible to discover, gather together, compare, analyse and manipulate images.

IIIF is a particularly useful solution for cultural heritage, where analysis of images is so important. A current ‘Towards a National Collection’ project has been looking at practical applications of IIIF.

The IIIF Solution

Exactly what IIIF enables depends upon a number of factors, but in general it enables:

Deep zoom: view and zoom in closely to see all the detail of an image

Sequencing: navigate through a book or sequence of archival materials

Comparisons: bring images together and put them side-by-side. This can enable researchers to bring together images from different collections, maybe material with the same provenance that has been separated over time.

Search within text: work with transcriptions and translations

Connections: connect to resources such as Wikidata

Use of different IIIF viewers: different viewers have their own features and facilities.

How It Works

The IIIF community tends to talk in terms of APIs. These can be thought of as agreed and structured ways to connect systems. If you have this kind of agreement then you can implement different systems, or parts of systems, to work with the same content, because you are sticking to an agreed structure. The basic principle is to store an image once (on a IIIF server) and be able to use it many times in many contexts.

IIIF is like a a layer above the data stores that host content. The images are accessed through that IIIF layer – or through the IIIF APIs. This enables different agents to create viewers and tools for the data held in all the stores.

Different repositories have their own data stores, but they can share content through the IIIF APIs.

There are a few different APIs that make up the IIIF standard.

Image API

This API delivers the content (or pixels). The image is delivered as a URL, and the URL is structured in an agreed way.

Presentation API

This delivers information on the presentation of the material, such as the sequence of a book, for example, or a bundle of letters, and metadata about the object.

This screenshot shows the Image API providing the zoomable image, and the presentation API providing basic information – the title and the sequence of the pages of this object.

Search API

Allows searching within the text of an object.

Authentication API

Allows materials to be restricted by audience. So, this is useful for sensitive images or images under copyright that may have restrictions.

IIIF viewer

As IIIF images are served in a standard way, any IIIF viewer can access them. Examples of IIIF viewers:

The Universal Viewer: https://universalviewer.io/
Mirador: https://mirador-dev.netlify.app/tests/integration/mirador/
Archival IIIF: https://archival-iiif.github.io/
Storiiies digital storytelling: https://storiiies.cogapp.com/#storiiies

There are a whole host of viewers available, with various functionality. Most will offer the basics of zooming and cropping. There does seem to be a question around why so many viewers are needed. It might be considered a better approach for the community to work on a limited group of viewers, but this may be a politically driven desire to own and brand a viewer. In the end, a IIIF viewer can display any IIIF content, and each viewer will have its own features and functionality.

To find out more about how researchers can benefit from IIIF, you may like to watch this presentation on YouTube (59m): Using IIIF for research 

Some Examples

In many projects, the aim is to digitise key materials, such as artworks of national importance and rare books and manuscripts, in order to provide a rich experience for end users. For instance, the Raphael Cartoons at the V&A are now available to explore different layers and detail, even enabling the infra-red view and surface view, to allow researchers to study the paintings in great depth. Images can easily be compared within your own workspace, by pulling in other IIIF images.

The V&A Raphael Cartoons can be viewed in ultra high resolution colour, exploring all of the layers

What is the Archives Hub planning to do with IIIF?

Hosting content: We are starting a 15 month project to explore options for hosting and delivering content. Integral to this project will be providing a IIIF Image API. As referenced above, this will mean that the digital content can be viewed in any IIIF viewer, because we will provide the necessary URLs to do so. One of the barriers for many archives is that images need to be on a IIIF server in order to utilise the Image API. It may be that Jisc can provide this service.

Creation of IIIF manifests: I’ll talk more about this in future blog posts, but the manifest is a part of the Presentation API. It contains a sequence (e.g. ordering of a book), as well as metadata such as a title, description, attribution, rights information, table of contents, and any other information about the objects that may be useful for presentation. We will be looking at how to create manifests efficiently and at scale, and the implications for representing hierarchical collections.

Providing an interface to manage content: This would be useful for any image store, so it does not relate specifically to IIIF. But it may have implications around the metadata provided and what we might put into a IIIF manifest.

Integrating a IIIF viewer into the Archives Hub: We will be providing a IIIF viewer so that the images that we host, and other IIIF images, can be viewed within the Archives Hub.

Assessing image quality: A key aim of this project is to assess the real-world situation of a typical archive repository in the UK, and how they can best engage with IIIF. Image resolution is one potential issue. Whilst any image can be served through the IIIF API, a lower resolution image will not give the end user the same sort of rich experience with zooming and analysing that a high resolution image provides. We will be considering the implications of the likely mix of different resolutions that many repositories will hold.

Looking at rights and IIIF: Rights are an important issue with archives, and we will be considering how to work with images at scale and ensure rights are respected.

Projects often have a finite goal of providing some kind of demonstrator showing what is possible, and they often pre-select material to work with. We are taking a different approach. We are working with a limited number of institutions, but we have not pre-selected ‘good’ material. We are simply going to try things out and see what works and what doesn’t, what the barriers are and how to overcome them. The process of ingest of the descriptive data and images will be part of the project. We are looking to consider both scalability and sustainability for the UK archive sector, including all different kinds of repositories with different resourcing and expertise, and with a whole variety of content and granularity of metadata.

Acknowledgement: This blog post cites the introductory video on IIIF which can be viewed within YouTube.