Machine Learning: Training the Model

A recent OCLC paper by Thomas Padilla highlights the need for ‘Pilot collaborations between institutions with representative collections’ and working ‘to share source data and produce “gold standard” training data.

We think that the Archives Hub Labs project exemplifes Tom’s suggested approach by working with ten of our contributing institutions from across the UK, reflecting a variety of archives.

However, it is also surely true that cultural heritage will need to engage with the broader AI and ML communities to understand and benefit fully from the range of ML services such as translation, transcription, object identification and facial recognition:

‘Advances in all of these areas are being driven and guided by the government or commercial sectors, which are infinitely better funded  than cultural memory; for example, many nation-states and major corporations are intensively interested  in facial recognition. The key strategy for the cultural memory sector will be to exploit these advantages, adapting and tuning the technologies around the margins for its own needs.’ From a short blog post by Dr Clifford Lynch from the CNI which is well worth reading.

People often criticise Machine Learning for being biased. But bias and mis-representation is essentially due to embedded bias in the input training data. The algorithm learns with what it has. So one of the key tasks for us as an archives community is to think about training data. We need algorithms that are trained to work for us to give us useful outputs.

Gathering training data in order to create useful models is going to be a challenge. Machine Learning is not like anything else that we have done before – we don’t actually know what we’ll get – we just know that we need to give the algorithm data that educates it in the way that we want. A bit like a child in school, we can teach it the curriculum, but we don’t know if it will pass the exam.

It certainly seems a given that we will need to use well labelled archival material as training data, so that the model is tailored specifically to the material we have. We will need to work together to provide this scale of training data. We have many wonderfully catalogued collections, with detail down to item level; as well as many collections that are catalogued quite basically, maybe just at collection level. If we join together as a community and utilise the well-catalogued content to train algorithms, we may be able to achieve something really useful to help make all collections more discoverable.

If an algorithm is trained on a fairly narrow set of data, then it is questionable whether it will have broad applicability. For example, if we train an algorithm on letters written in the 18th century, but just authored by two or three people, then it is unlikely to learn enough to be of real use with transcription; but if we train it on the handwriting of fifty people or more, then it could be a really useful tool for recognising and transcribing 18th century letters To do this training, we will need to bring content together. We will need to share the Machine Learning journey. The benefits could be massive in terms of discoverability of archives; effective discovery for all those materials that we currently don’t have time to catalogue. The main danger is that the resulting identification, transcription, tagging or whatever, is not to the standard that we want. We can only experiment and see what happens if we trial ML with a set of data (which is what we are doing now with our Labs project). One benefit could actually be much more consistency across collections. As someone working on aggregating data from 350 organisations, I can testify that we are not consistent! – and this lack of consistency impairs discovery.

Archival content is likely to be distinct in terms of both quality and subject. Typescripts might be old and faded, manuscripts might be hard to read, photographs might be black and white and not as high resolution as modern prints. Photographs might be of historical artefacts that are not recognised by most algorithms. We have specific challenges with our material, and we need the algorithms to learn from our material, in order to then provide something useful as we input more content.

In terms of subject, the Lotus and Delta shoe shops are a good example of a specific topic. They are represented in the Joseph Emberton papers, at the University of Brighton Design Archives, with a series of photographs. Architecture is potentially an interesting area to focus on. ML could give us some outputs that provide information on architectural features. It could be that the design of Lotus and Delta shops can be connected to other shops with similar architectures and shop fronts. ML may pick out features that a cataloguer may not include. On the other hand, we may find that it is extremely hard to train an algorithm on old black and white and potentially low resolution photographs in order for it to learn what a shop is, and maybe what a shoe shop is.

In this collection a number of the photographs are of exteriors. Some are identified by location, and some are not yet identified.

photo of Emberton shoe shop, Harrogate
Harrogate
Photo of Edinburgh shoe shop exterior
Edinburgh
Photo of unidentified shoe shop
Unidentified shop

These photographs have been catalogued to item level, and so researchers will be able to find these when searching for ‘shops’ and particularly ‘shoe shops’ on the Hub, e.g. a search for ‘harrogate shoe shop‘ finds the exterior of a shop front in Harrogate. There may not be much more that could be provided for searching this collection, unless machine learning could label the type of shop front, the type of windows and signage for example. This seems very challenging with these old photographs, but presumably not impossible. With ML it is a matter of trying things out. You might think that if artificial intelligence can master self-driving cars it can master shop exteriors….but it is not a foregone conclusion.

If the model was trained with this set of photographs, then other shop fronts could potentially be identified in photographs that aren’t catalogued individually. We could potentially end up with collections from many different archives tagged with ‘shop front’ and potentially with ‘shoes’. Whether an unidentified shop front could be be identified is less certain, unless there are definite contextual features to work with.

interior of ladies department shoe shop
Interior of ladies’ dept.
photograph of shoe shop interior
Interior of men’s dept.

Shop interiors are likely to be even more of a challenge. But it will be exciting to try things like this out and see what we get.

Commercial providers offer black box solutions, and we can be sure they were not trained to work well with archives. They may be adapted to new situations, but it is unlikely they can ever work effectively for archival content. I explored this to an extent in my last blog post. However, it is worth considering that a model not trained on archival material may highlight objects or topics that we would not think of including in a catalogue entry.

The Archives Hub and Jisc could play a pivotal role in co-ordinating work to create better models for archival material. Aggregation allows for providing more training material, and thus creating more effective models.

To date, most ML projects in libraries have required bespoke data annotation to create sufficient training data. Reproducing this work for every ML project, however, risks wasting both time and labor, and there are ample opportunities for scholars to share and build upon each other’s work.’ (R. Cordell, LC Labs report)

We can have a role to play in ‘data gathering, sharing, annotation, ethics monitoring, and record-keeping processes‘ (Eun Seo Jo, Timnit Gebru, https://arxiv.org/abs/1912.10389). We will need to think about how to bring our contributors into the loop in order to check and feedback on the ML outputs. This is a non-trivial part of the process that we are considering at the moment. We need an interface that displays the results of our ML trials.

One of the interesting aspects of this is that collections that have been catalogued in detail will provide the training data for collections that are not. Will this prove to be a barrier, or will it bring us together as a community? In theory the resources that some archives have, which have enabled them to catalogue to item level, can benefit those with minimal resources. Would this be a free and open exchange, or would we start to see a commercial framework developing?

It is also important that we don’t ignore the catalogue entries from our 350 contributors. Catalogues could provide great fodder for ML – we could start to establish connections and commonalities and increase the utility of the catalogues considerably.

The issue of how to incorporate the results of ML into the end user discovery interface is yet another challenge. Is it fundamentally important that end users know what has been done through ML and what has been done by a human? I can’t help thinking that over time the lines will blur, as we become more comfortable with AI….or as AI simply becomes more integrated into our world. It is clear that many people don’t realise how much Artificial Intelligence sits behind so many systems and processes that we use on an everyday basis. But I think that for the time being, it would be useful to make that distinction within our end user interfaces, so that people know why something has been catalogued or described in a certain way and so that we can assess the effectiveness of the ML contribution.

In subsequent posts we aim to share some initial findings from doing work at scale. We will only be able to undertake some modest experiments, but we hope that we are contributing to the start of what will be a very big adventure for archives.

Uncovering women’s role in Austrian refugee theatre: the exile archives of the Institute of Modern Languages Research  

Archives Hub feature for March 2022

For the 30,000 traumatised refugees from Nazi-occupied Austria living in the UK at the start of the Second World War, the Austrian exile theatre the Laterndl was a beacon of light and hope during the dark days of the Third Reich. Refugees were living with the loss of their homes, the uncertain fate of families left behind, and the poverty and isolation of exile life. At the theatre they could laugh, weep and mourn together over stories, music and poetry presented by performers who shared the same experiences. For the artists themselves, the theatre allowed them to escape the daily grind of refugee life, provide a home for Austrian culture and contribute to the fight against Nazism.

Laterndl publicity leaflet, 1939 (Miller/6/1/1)

Members of the Research Centre for German and Austrian Exile Studies at the University of London have begun to piece together the history of the theatre using the papers of Austrian Jewish refugees Martin Miller and Hannah Norbert Miller, key figures at the Laterndl. Their papers are one of a growing number of archives of German-speaking exiles held at Senate House Library on behalf of the Institute of Modern Languages Research. A programme to catalogue and promote the collections has been funded in recent years by the Martin Miller and Hannah Norbert Miller Trust and the records have now been added to the Archives Hub. This feature for the Hub marks Women’s History Month by considering the role of women in the theatre and how they contributed to its aim to keep alive the spirit of resistance to the Nazis.

Five of the 16 artists who contributed to the opening production of the Laterndl in June 1939 were female artists, all experienced professionals. They played an important role both on stage and behind the scenes from the offset. The cast of the first production ‘Unterwegs’ included seasoned theatre performers Lona Cross, Marianne Walla and Greta Hartwig. Cross had performed in regional Austrian theatre and Walla and Hartwig were active in anti-fascist political cabaret in Vienna in the mid-1930s. ‘Unterwegs’ offered a wide range of strong female roles and included one scene, ‘Bow Street’ which was singled out for particular praise by reviewers. Standing on trial at Bow Street court before ‘General Bias’ and ‘Mrs Charity’, Walla, playing the ‘Eternal Woman’ alongside the ‘Eternal Jew’ and the ‘Eternal Revolutionary’, made a powerful plea for leniency and understanding from the British authorities for women who had taken a stand against Nazism.

Greta Hartwig and Martin Miller watching a Laterndl rehearsal, June 1939 (Miller/3/1/1/1)

In early 1940 another Viennese actor already familiar to Austrian theatre audiences joined the troupe, Hannah Norbert Miller (then Hanne Norbert). Norbert soon became one of the leading performers, appearing in over ten productions in three years. She also acted with other exile theatre groups and had a wide network of contacts which helped connect the Laterndl players with the wider German-speaking theatre scene. Norbert’s excellent English enabled her to act as commere, communicating the theatre’s message of resistance against Nazism to British audience members, who included well-known cultural figures like J.B. Priestly and Richard Crossman of the BBC.

Hanne Norbert’s commere script introducing two scenes at the Laterndl, 1940 (Miller/1/2/1/5)

Theatre programmes in the archive indicate that female artists also worked in a range of non-acting roles over the course of the theatre’s existence. Kaethe Knepler was a musician and pianist from Germany who worked as director of music at the Laterndl in 1941 and 1942 together with her husband, Georg, a musicologist. The couple regularly performed as a duo, and in 1940 Kaethe Knepler composed the setting for a song by Jura Soyfer, a young Austrian writer who had died in Buchenwald a year before.

Laterndl programme for a production of Johann Nestroy’s ‘Der Talisman’, 1941 (Miller/5/1/9)

Costumes for the first three productions were the responsibility of two Viennese designers, Hertha Winter and Kaethe Berl. Little is known about Winter’s background, but Berl had studied design at art school and in the post-war era she would became a pioneer in enamel art in New York. With wartime shortages and the Laterndl’s tiny budget, the pair had to summon all their creativity to produce costumes, improvising them out of old garments or purchasing them cheaply here and there, including in the East End’s Petticoat Lane. Berl also designed the distinctive red logo for the theatre shown on the programme (above).

‘Trip to Paradise’ by Jura Soyfer, performed by the Laterndl Theatre, showing costume designed by Herta Winter, with Marianne Walla as Fritzi on the right, 1940 (Miller/3/1/1/5)

One of the most powerful anti-Nazi plays produced by the Laterndl was written by the theatre’s only female writer, journalist and Communist activist Eva Priester. Priester’s ‘The Verdict’, performed in the autumn of 1942, saw Norbert and Walla play two women imprisoned in a cell together in an unknown location in Nazi Europe. The women unite against their male guard and anticipate the liberation of Europe with the declaration: ‘We are not alone. They will come over the sea, by ship, any moment now they could come and land in France and open our doors. Can you hear them – soon they will break down the iron doors – soon they will be here!’

Eva Priester’s ‘The Verdict’, performed by the Laterndl Theatre, with Marianne Walla (left) and Hanne Norbert (right), 1942 (Miller/3/1/1/10)

By the end of the war over 40 women refugees had worked at the theatre, some of them over several years. How many of them managed to rebuild their careers as artists in the post-war world is not recorded these archives, though for a lucky few, at least, the Laterndl was a stepping stone to a career in the performing arts in the UK, such as the BBC. What is clear is that, despite the hardship and pain of their situation, women played a central role in the theatre, helping to keep alive the hopes of the community in a better post-war world and an independent and democratic Austria.

Dr Clare George
Archivist (Martin Miller and Hannah Norbert-Miller Trust)
Research Centre for German and Austrian Exile Studies
Institute of Modern Languages Research
University of London School of Advanced Study
Senate House Library

Related

Martin Miller and Hannah Norbert-Miller Archive

Browse all Institute of Modern Languages Research collection descriptions on the Archives Hub

All images copyright Institute of Modern Languages Research, University of London. Reproduced with the kind permission of the copyright holders.

Machine Learning with Archive Collections

Machine Learning is a sub-set of Artificial Intelligence (AI). You might like to look at devopedia.org for a short introduction to Machine Learning (ML).

Machine Learning is a data-oriented technique that enables computers to learn from experience. Human experience comes from our interaction with the environment. For computers, experience is indirect. It’s based on data collected from the world, data about the world.

Definition of Machine Learning from devopedia.org

The idea of this and subsequent blog posts is to look at machine learning from a specifically archival point of view as well as update you on our Labs project, Images and Machine Learning. We hope that our blog posts help archivists and other information professionals within the archival or cultural heritage domain to better understand ML and how it might be used.

AI can be used for many areas of learning and research. Chatbots have been trialled at some institutions, for example, ‘Ada’ at Bolton College has generally been well received. AI can be useful for aspects of website usability and accessibility, or helping students to choose the right university degree. The Jisc National Centre for AI site has more information on how AI can add value for education and learning.

At the Archives Hub we are particularly focussed on looking at Machine Learning from the point of view of archival catalogues and digital content, to aid discoverability, and potentially to identify patterns and bias in cataloguing.

Machine Learning to aid discoverability can be carried out as supervised or unsupervised learning. Supervised learning may be the most reliable, producing the best results. It requires a set of data that contains both the inputs and the desired outputs. By ‘outputs’ we mean that the objective is provided by labelling some of the input data. This is often called training data. In a ‘traditional’ scenario, code is written to take input and create output; in machine learning, input and output is provided, and the part done by human code is instead done by machine algorithms to create a model. This model is then used to derive outputs from further inputs.

The machine learning model, or program, is the outcome of learning from data (source: Advani 2020)

So, for example, taking the Vickers instruments collection from the Borthwick: https://dlib.york.ac.uk/yodl/app/collection/detail?id=york%3a796319&ref=browse. You may want to recognise optical instruments, for example, telescopes and microscopes. You could provide training data with a set of labelled images (output data) to create a model. You could then input additional images and see if the optical instruments are identified by the model.

Of course, the Borthwick may have catalogued these photographs already (in fact, they have been catalogued), so we know which are telescopes and which are micrometers or lenses or eye pieces. If you have a specialist collection, essentially focused on a subject, and the photographs are already labelled, then there may be less scope for improving discoverability for that collection by using machine learning. If the Borthwick had only catalogued a few boxes of photographs, they might consider using machine learning to label the remaining photographs. However, a big advantage is that the enhanced telescope recognising model can now be used on all the images from the Archives Hub to discover and label images containing telescopes from other collections. This is one of the great advantages of applying ML across the aggregated data of the Archives Hub. The results of machine learning are always going to be better with more training data, so ideally you would provide a large collection of labelled photographs in order to teach the algorithm. Archive collections may not always be at the kind of scale where this process is optimised. Providing good training data is potentially a very substantial task, and does require that the content is labelled. It is possible to use models that are already available without doing this training step, but the results are likely to be far less useful.

Another scenario that could lend itself to ML is a more varied collection, such as Borthwick’s University photograph collection. These have been catalogued, but there is potential to recognise various additional elements within the photographs.

construction site with people
Construction of the J.B. Morrell Library, University of York

The above photograph has been labelled as a construction site. ML could recognise that there are people in the photograph, and this information could be added, so a researcher could then look for construction site with people. Recognising people in a photograph is something that many ML tools are able to do, having already been trained on this. However, archive collections are often composed of historic documents and old photographs that may not be as clear as modern documents. In addition, the models will probably have been trained with more current content. This is likely to be an issue for archives generally. For models to be effective, they need to have been trained with content that is similar to the content we want to catalogue.

The Amazon Web Services (AWS) Rekognition facial recognition tool finds three faces…
…the Microsoft Azure facial recognition tool doesn’t do so well.

The benefits of adding labels to photographs via ML to potentially enhance the catalogue and help with discoverability is going to depend upon a number of factors: how well the image is already catalogued, whether training data can be provided to improve the algorithm, how well ML can then pick out features that might be of use.

The drawings of fossil fish at the Geological Society are another example of a very subject specific collection. We put a few of these through some out-of-the-box ML tools. These tools have been pre-trained on large diverse datasets, but we have not done any additional training ourselves yet, so you could see them as generalists in recognising entities rather than specialists with any particular material or topic.

drawing of a fossil tortoise
Fossil tortoise from Oeningen

In this case the drawing has been tagged with ‘fossil’, which could be useful if you wanted to identify fossil drawings from a varied collection of drawings. It has also tagged this with archaeology and art, both of which could potentially be useful, again depending upon the context. The label of soil is a bit more problematic, and yet it is the one that has been added with 99.5% certainty. However, a bit of training to tell the algorithm that ‘soil’ is not correct may remove this tag from subsequent drawings.

This example illustrates the above point that a subject specific collection may be tagged with labels that are already provided in the catalogue description. It also shows that machine learning is unlikely to ever be perfectly accurate (although there are many claims it outperforms humans in a number of areas). It is very likely to add labels that are not correct. Ideally we would train the model to make less mistakes – though it is unlikely that all mistakes will be eliminated – so that does mean some level of manual review.

Tagging an image using ML may draw out features that would not necessarily be added to the catalogue – maybe they are not relevant to the repository’s main theme, and in the end, it is too time-consuming for cataloguers themselves to describe each photo in great detail as part of the cataloguing process.

Queen’s University Belfast: Hart Collection – China Photographs

The above image is a simple one with not too much going on. It will be discoverable on the Queen’s website through a search for ‘china’ or ‘robert hart’ for example, but tagging could make it discoverable for those interested in plants or architectural features. Again, false positives could be a problem, so a key here is to think about levels of certainty and how to manage expectations.

As mentioned above, archival images are often difficult to interpret. They may be old and faded, and they may also represent features or items that an algorithm will not recognise.

Design Council Archive: Things in their home setting – detail of a living room

In the above example from Brighton Design Archives, the photograph is from a set made of an exhibition of 1947, Things In Their Home Setting. The AWS image Rekognition service has no problem with the chair, but it has confidently identified the oven as a refrigerator. This could probably be corrected by providing more training data, or giving feedback to improve the understanding of the algorithm and its knowledge of 1940’s kitchen furniture. But by the time you have given enough training data for the model to recognise a cooker from a fridge from a washing machine, it might have been easier simply to do the cataloguing manually.

Another option for machine learning is optical character recognition. This has been around for a while, but it has improved substantially as a result of the machine learning approach. Again, one of the challenges for archives is that many items within the collections are handwritten, faded, and generally not easily readable. So, can ML prove to be better with these items than previous OCR approaches?

A tool like Transkribus can potentially offer great benefits to archives, and is seen as a community-driven effort to create, gather and share training data. We hope to try out some experiments with it in the course of our project.

Clerkenwell St James Parish, General Plan

The above plan is from Lambeth Palace Library’s 19th century ecclesiastical maps. It can already be found searching for ‘clerkenwell’ or ‘st james parish’. But ML could potentially provide more searchable information.

OCR using Azure

The words here are fairly clear, so the character recognition using the Microsoft Azure ML service is quite good. Obviously the formatting is an issue in terms of word order. ‘James’ is recognised as ‘Iames’ due to the style of writing. ‘Church’ is recognised despite the style looking like ‘Chvrch’ – this will be something the algorithm has learnt. This analysis could potentially be useful to add to the catalogue because an end user could then search for ‘pentonville chapel’ or ‘northampton square’ and find this plan.

As well as looking at digital archives, we will be trying out examples with catalogue text. A great deal of archival cataloguing is legacy data, and archivists do not always have the time to catalogue to item level or to add index terms, which can substantially aid discoverability. So, it is tempting to look at ML as a means to substantially improve our catalogues. For example, to add to our index terms, which provide structured access points for end users searching for people, organisations, places and subjects.

In a traditional approach to adding subject terms to a catalogue, you might write rules. We have done this in our Names Project – we have written a whole load of rules in order to identify name, life dates, and additional data within index terms. We could have written even more rules – for example, to try to identify forename and surname. But it would be very difficult because the data does not present the elements of names consistently. We could potentially train an ML model with a load of names, tagging the parts of the name as forename, surname, dates, titles, epithets. But could an algorithm then successfully work out the parts of any subsequent names that we feed into it? It seems unlikely because there is no real consistency in how cataloguers input names. The algorithm might learn, for example, that a word, then a comma, then another word is surname, forename (Roberts, Elizabeth). But two words followed by a comma and another word could be surname + forename or forename + surname, (Vaughan Williams, Ralph; Gerald Finzi, composer). In this scenario, the best option may be to aim to use source data (e.g. the Virtual International Authority File) to compare our data to, rather than try to train a machine to learn patterns, when there really isn’t a model to provide the input.

We may find that analysing text within a catalogue offers more promise.

Part of the admin history for the British Linen Company archive at Lloyds

Here is an example from an administrative history of the British Linen Group, a collection held by Lloyds Banking Group. The entity recognition is pretty good – people’s names, organisations, dates, places, occupations and other entities can be picked out fairly successfully from catalogues. Of course that is only the first step; it is how to then use that information that is the main issue. You would not necessarily want to apply the terms as index terms for example, as they may not be what the collection is substantially about. But from the above example you could easily imagine tagging all the place names with a ‘place’ tag, so that a place search could find them. So, a general search for Stranraer would obviously find this catalogue entry, but if you could identify it as a place name it could be included in the more specific place name search.

With machine learning it is very difficult and sometimes impossible to understand exactly what is happening and why. By definition, the machine learns and modifies its output. Whilst you can provide training data to give inputs and desired outputs, machine learning will always be just that….a machine learning as it goes along, and not simply working through a programme that a human has written. Supervised learning provides for the most control over the outputs. Unsupervised learning, and deep learning, are where you have much less control (we’ll come onto those in later posts).

It is only by understanding the algorithms and what they are doing that you can set up your environment for the best results. But that is where things can get very complicated. We are going to try to run some experiments where we do prepare the data, but learning how to do this is a non-trivial task. Hence one of the questions we are asking is ‘is Machine Learning worth the effort required in order to improve archival discoverability?’ We hope to get at least some way along the road to answering that question.

There are, of course, other pressing questions, not least the issue of bias, and concerns about energy use with machine learning as well as how to preserve the processes and outputs of ML and document the decision making. But there could be big wins in terms of saving time that can then be dedicated to other tasks. The increasing volumes of data that we have to process may make this a necessity. We hope to touch upon some of these areas, but this is a fairly small scale project and Machine Learning it is one huge topic.

Local and Global Memory in the Islamic Relief Archive

Archives Hub feature for February 2022

In 1984 reports of an unfolding famine crisis in East Africa began to reach the international community. Band Aid’s ‘Feed the World’ charity song and the Live Aid concerts are probably the most well-known of the responses to the situation, but these were by no means the only efforts. In Birmingham a group of young Muslim volunteers led by Dr Hany El Bana OBE, then a medical student at University of Birmingham, began to fundraise in mosques, though friends and family and local Islamic associations. They were successful in raising enough funds to implement a project to build two chicken farms in Sudan along with two other projects to distribute biscuits and multivitamins (also to Sudan) and flour to Mauritania in one year.  As fundraising efforts took off the name ‘Islamic Relief’ was adopted and a small one-room office was rented from which the group coordinated their growing operations.

Photographs of Islamic Relief’s first project, two chicken farms in Sudan, 1984
Volunteers receive donations for the Sudan Food Crisis and Bangladesh Flooding Appeals in Birmingham, 1988

Fundraising around the seasonal observance of Ramadan (a sacred month of fasting in Islam) soon became a mainstay. The group organised tours of national mosques selling prayer mats and other small items in a van they called the ‘Caravan’. Raising money through the Islamic principles of zakat (a form of alms-giving and religious tax) and sadaqah (voluntary charity giving) were also a key part of the work and remain so at Islamic Relief to this day. This evidence of Muslim community based voluntary action is one part of what makes the Islamic Relief Archive truly unique and significant. Today Islamic Relief Worldwide has grown to one of, if not the world’s largest Islamic faith-inspired NGOs currently working in over 40 countries. Islamic Relief was founded with a single donation of 20p, in 2020 we had and income of over £149 million.

Ramadan Appeal flyer, 1980s

Humanitarian and development work has always been at the heart of what Islamic Relief does. The archive documents major humanitarian responses to some of the most notable global events of the last four decades. This includes conflict in Bosnia and Chechnya in the 1990s, crises in Iraq and Afghanistan in the 2000s, tsunami in Asia 2004, genocide in Rwanda in 1994 and earthquake in Pakistan in 2006. The ‘International Programmes’ series (IRW/IP) contains a wealth of materials relating to both emergency responses and also development work in countries such as India, Bangladesh, Mali, Niger and Occupied Palestinian Territories.  Here you can find records such as project reports, country strategy documents and case studies. You can also find related photographic materials in the ‘Audio Visual’ (IRW/AV) series, publications such as emergency update reports, country annual reports and newsletters in the ‘Publications and ephemera’ series (IRW/PUB). Within the fundraising the ‘Emergency appeals’ sub-series (IRW/FU/2/3) will also yield results on IRW’s fundraising efforts in relation to specific international situations. Today, Islamic Relief is present at crises in Afghanistan, Syria and Yemen. The archive continues to collect materials relating to these significant global events. 

2002 Emergency Appeal flyer
A sample from Kosova Shelter Project report, 1999 (page 1)
A sample from Kosova Shelter Project report, 1999 (page 2)

In 2021 Islamic Relief made its archive accessible to the public for the first time with our catalogues newly available through Archives Hub. The records have meaning at a local, national and international level and we believe that in making them accessible they will not only contribute to research in the fields of humanitarianism and histories of the charity sector, they will also importantly increase the representation of Muslims and Muslim communities in the shared archival landscape. As the archive continues to grow and further cataloguing is undertaken we hope that researchers and a wide public audience will be able to benefit from this rich and valuable source of local and global memory.

Elizabeth Shuck, Archivist
Islamic Relief Worldwide

Related

Records of Islamic Relief Worldwide (1984 to date) on the Archives Hub

All images copyright Islamic Relief Worldwide. Reproduced with the kind permission of the copyright holders.

Images and Machine Learning Project

Under our new Labs umbrella, we have started a new project, ‘Images and Machine Learning’ it has three distinct and related strands.

screenshot with bullet points to describe the DAO store, IIIF and Machine Learning
The three themes of the project

We will be working on these themes with ten participants, who already contribute to the Archives Hub, and who have expressed an interest in one or more of these strands: Cardiff University, Bangor University, Brighton Design Archives at the University of Brighton, Queens University Belfast, the University of Hull, the Borthwick Institute for Archives at the University of York, the Geological Society, the Paul Mellon Centre, Lambeth Palace (Church of England) and Lloyds Bank.

This project is not about pre-selecting participants or content that meet any kind of criteria. The point is to work with a whole variety of descriptions and images, and not in any sense to ‘cherry pick’ descriptions or images in order to make our lives easier. We want a realistic sense of what is required to implement digital storage and IIIF display, and we want to see how machine learning tools work with a range of content. Some of the participants will be able to dedicate more time to the project, others will have very little time, some will have technical experience, others won’t. A successful implementation that runs beyond our project and into service will need to fit in with our contributors needs and limitations. It is problematic to run a project that asks for unrealistic amounts of time from people that will not be achievable long-term, as trying to turn a project into a service is not likely to work.

DAO Store

Over the years we have been asked a number of times about hosting content for our contributors. Whilst there are already options available for hosting, there are issues of cost, technical support, fit for purpose-ness, trust and security for archives that are not necessarily easily met.

Jisc can potentially provide a digital object store that is relatively inexpensive, integrated with the current Archives Hub tools and interfaces, and designed specifically to meet our own contributors’ requirements. In order to explore this proposal, we are going to invest some resource into modifying our current administrative interface, the CIIM, to enable the ingest of digital content.

We spent some time looking at the feasibility of integrating an archival digital object store with the current Jisc Preservation Service. However, for various reasons this did not prove to be a practical solution. One of the main issues is the particular nature of archives as hierarchical multi-level collections. Archival metadata has its own particular requirements. The CIIM is already set up to work with EAD descriptions and by using the CIIM we have full control over the metadata so that we can design it to meet the needs of archives. It also allows us to more easily think about enabling IIIF (see below).

The idea is that contributors use the CIIM to upload content and attach metadata. They can then organise and search their content, and publish it, in order to give it web address URIs that can be added to their archival descriptions – both in the Archives Hub and elsewhere.

It should be noted that this store is not designed to be a preservation solution. As said, Jisc already provides this service, and there are many other services available. This is a store for access and use, and for providing IIIF enabled content.

The metadata fields have not yet been finalised, but we have a working proposal and some thoughts about each field.

Titlemandatory? individual vs batch?
Datespreferably structured, options for approx. and not dated.
Licencepossibly a URI. option to add institution’s rights statement.
Resource typecontrolled list. values to be determined with participants. could upload a thesaurus. could try ML to identify type.
Keywordsfree text
Taggingenable digital objects to be grouped e.g by topic or e.g. ‘to do’ to indicate work is required
Statusunpublished/published. May refer to IIIF enabled.
URLunique URI of image (at individual level)
Proposed fields for the Digital Object Store

We need to think about the workflow and user interface. The images would be uploaded and not published by default, so that they would only be available to the DAO Store user at that point. On publication, they would be available at a designated URL. Would we then give the option to re-size? Would we set a maximum size? How would this fit in with IIIF and the preference for images of a higher resolution? We will certainly need to think about how to handle low resolution images.

International Image Interoperability Framework

IIIF is a framework that enables images to be viewed in any IIIF viewer. Typically, they can be sequenced, such as for a book, and they are zoomable to a very high resolution. At the heart of IIIF is the principle that organisations expose images over the web in a way that allows researchers to use images from anywhere, using any platform that speaks IIIF. This means a researcher can group images for their own research purposes, and very easily compare them. IIIF promotes the idea of fully open digital content, and works best with high resolution images.

There are a number of demos here: https://matienzo.org/iiif-archives-demo/

And here is a demo provided by Project Mirador: http://projectmirador.org/demo/

An example from the University of Cambridge: https://cudl.lib.cam.ac.uk/view/MS-RGO-00014-00051/358

And one from the University of Manchester: https://www.digitalcollections.manchester.ac.uk/collections/ruskin/1

There are very good reasons for the Archives Hub to get involved in IIIF, but there are challenges being an aggregator that individual institutions don’t face, or at least not to the same degree. We won’t know what digital content we will receive, so we have to think about how to work with images of varying resolutions. Our contributors will have different preferences for the interface and functionality. On the plus side, we are a large and established service, with technical expertise and good relationships with our contributors. We can potentially help smaller and less well-resourced institutions into this world. In addition, we are well positioned to establish a community of use, to share experiences and challenges.

One thing that we are very convinced by: IIIF is a really effective way to surface digital content and it is an enormous boon to researchers. So, it makes total sense for us to move into this area. With this in mind, Jisc has become a member of the IIIF Consortium, and we aim to take advantage of the knowledge and experience within the community – and to contribute to it.

Machine Learning

This is a huge area, and it can feel rather daunting. It is also very complicated, and we are under no illusions that it will be a long road, probably with plenty of blind alleys. It is very exciting, but not without big challenges.

It seems as if ML is getting a bad reputation lately, with the idea that algorithms make decisions that are often unfair or unjust, or that are clearly biased. But the main issue lies with the data. ML is about machines learning from data, and if the data is inadequate, biased, or suspect in some way, then the outcomes are not likely to be good. ML offers us a big opportunity to analyse our data. It can help us surface bias and problematic cataloguing.

We want to take the descriptions and images that our participants provide and see what we can do with ML tools. Obviously we won’t do anything that affects the data without consulting with our contributors. But it is best with ML to have a large amount of data, and so this is an area where an aggregator has an advantage.

This area is truly exploratory. We are not aiming for anything other than the broad idea of improved discoverability. We will see if ML can help identify entities, such as people, places and concepts. But we are also open to looking at the results of ML and thinking about how we might benefit from them. We may conclude that ML only has limited use for us – at least, as it stands now. But it is changing all the time, and becoming more sophisticated. It is something that will only grow and become more embedded within cultural heritage.

Over the next several months we will be blogging about the project, and we would be very pleased to receive feedback and thoughts. We will also be holding some webinar sessions. These will be advertised to contributors via our contributors list, and advertised on the JiscMail archives-nra list.

Thoughts on the Heritage PIDs Project

I attended the final Zoom session for the Heritage Persistent Identifiers Project this week.

PID or Persistent Identifiers can be incredibly useful within the heritage sector. The PID project was looking at the use of PIDs across collections. They were aiming to increase uptake of PIDs, so that they service as a foundation infrastructure for drawing collections together.

The project ran two surveys with responses mainly from the UK but a number from other countries. 66 and 47 responses were received for the 1st and 2nd surveys respectively. Both surveys showed that most institutions have pockets of awareness of PIDs, although the number of people with no awareness decreased slightly over time.

The main barriers according to the surveys are lack of resources and technical issues. It is also clear that decision makers need to be more appreciative the benefits of PIDs.

The project case studies were found to be particularly useful by survey respondents, and also the PID demonstrator that showed how collections can be linked through PIDs. The case studies included the National Gallery – interestingly they are using the CIIM, as we are, so their PIDs were created as a component of the CIIM.

One thing that struck me as I was listening is that PIDs apply to all sorts of things – documents, objects, collections, publications, people, organisations, places. I think that this can make it difficult to grasp the context when people are talking about PIDs in general. I found myself getting a bit lost in the conversation because it is such a large landscape, and I am someone who has a reasonable knowledge of this area.

Within the Archives Hub we have persistent identification of descriptions, at all levels – so each unit of description has a PID. e.g. https://archiveshub.jisc.ac.uk/data/gb275-davies uses the country code GB, the repository code 275 and the reference ‘davies’. These are URIs, which gives more utility, as they can be referenced on the Web as well as in publications. We had very very long discussions about the make-up of these identifiers. We did consider having completely opaque identifiers, but we felt there was some advantage of having user-friendly URIs, especially for things like analytics – if you see that ‘gb275-davies’ has had 53 views then you may know what that means, whereas if ‘27530981’ has had 53 views, you have to go and dereference it to find out what that actually is. However, references can change over time, so if you use them in persistent identifiers you have a problem when the reference changes.

Granularity is a question that needs to be addressed when thinking about PIDs for archives. Should every item have a DOI for example (digital object identifier)?. Should the DOI be assigned to the collection? Not all collections are described to item level, so in many cases this might be a moot point. So far I don’t think we’ve received archive descriptions that include DOIs so I don’t think it is going to be top of the agenda for archives any time soon. It may not be something that we, as an aggregator, necessarily get involved with anyway. If a contributor to the Hub includes a DOI, then we can display that, and maybe that is our work done. I’m not sure that it has a role in linking aggregated data to other datasets.

ARKs were mentioned in the session. We haven’t yet considered using these within our system. We’ve only had 2 contributors out of 350 who have included them, so we are not sure that it is worth us working with them at this stage. This is one of the problems with adopting PIDs – uptake and scale. ORCIDs were also referenced. An ORCID is for researchers – eventually their papers may come to the archive, so ORCID IDs may become more relevant in time. It is important for ORCID to work with Wikidata and other PIDs to enable linking. Bionomia was mentioned as a project that already works with ORCID and Wikidata.

Overall my impression listening to the presentations was of a very mixed landscape, and that is something that makes it harder to figure out how to start working with PIDs – there is no one clear way forward. In the case studies presented there was quite a bit of emphasis on internal use cases, and that can limit the external benefits, but there was also a range of approaches. This doesn’t help anyone starting out and hoping for a clear way forward.

The Archives Hub has done work on identifying personal and organisational names and we are going to be blogging more about the outcome of that when work we implement changes to our user interface over the next few months. But it is worth saying that if you want to implement PIDs for names, you have to look at the names you have and how identifiable they really are. It has been extremely difficult for us to do this work, and we cannot possibly achieve 100% identification because of the very variable state of the names that we have in the data.

PIDs need to know what they are identifying, and being clear about what that is may in itself be a big challenge. If you assign a PID to a person, an organisation, or any entity, you want to be confident that it is right. ORCIDs are for current researchers, and if you set yourself up with an ORCID, you are going to know that it identifies you (one would hope). But if we have seven ‘Elizabeth Roberts‘ referred to on the Archives Hub, referenced in a range of archives, we may find it very difficult to know if they are the same person. Assigning identification to historical records is a massive detective challenge.

We have been looking to match our names to VIAF or Wikidata, so that we can benefit from these widely used PIDs. But to do that we need to find a way to create matches and set levels of confidence for matches. Increasingly, I am wondering if Wikidata is more promising than VIAF due to the ability to add to the database. For archives, where many names are not published individuals, this might prove to be a good way forward.

The PID project came up with a number of recommendations. Many of these were about generally promoting PIDs and integrating them into workflows. Quite a few of the recommendations look like they need significant funding. One that I think is very pertinent is working with system suppliers. It needs to be straightforward to integrate PIDs when a collection is being catalogued.

The recommendations tended to just refer to PIDs and not specific PIDs and I’m not sure whether this is helpful as it is such a broad context. Maybe it is more useful to be more specific about whether you are looking at PIDs for collections/artefacts or for researchers, for all names or for topics. For example, if you recommend looking at cost analysis, is this for any and all PIDs that might be implemented across all of the cultural heritage sector? The project has found that it is not possible to be prescriptive and narrow things down, but I still feel that talking about certain kinds of identifiers rather than PIDs in general might help to give more context to the conversation.

There are many persistent identifier systems. If we all use different identifiers then we aren’t really getting towards the kind of interconnectivity that we are after. We could do with adopting a common approach – even just a common approach within the archives domain would be useful – but that requires resource and that requires funding. Having said that, it is not essential to use exactly the same PIDs. For example, if one organisation adopts VIAF IDs for their names and another adopts Wikidata Q codes, then that is not really a problem in that VIAF and Wikidata link to each other. But adopting a system that is not widely used (and not linked up to other systems) is not really going to be very helpful.

In the end, we need a very clear sense of the benefits that PIDs will bring us. As an aggregator it is very difficult to add PIDs to data that we receive. Archives should ideally add PIDs as they create descriptions. If VIAF IDs or Wikidata Q codes, or Geonames identifiers for place names, were added during cataloguing, that could potentially be of great benefit. But this raises a big issue – we need archival management systems to make it really easy to add PIDs, and at present many of them don’t do this. Our own cataloguing tool does provide a look-up and this has proved to be really successful. It makes adding identifiers easier than not adding them – and that is what you want to achieve.

Meet Maria Dawson, first graduate of the University of Wales

Archives Hub feature for January 2022

Our institutional archives, 144 metres of which are now catalogued on Archives Hub, hold the key to countless stories of student achievements, past and present. One of our most noteworthy alumni is botanist Maria Dawson, the recipient of the University of Wales’ first awarded degree, a Bachelor of Science, in 1896.

Dawson also jointly holds the title of the first Doctor of Science of the University of Wales. She was granted a prestigious scientific scholarship which funded her pioneering research into agricultural fertilisers.

Maria Dawson

Degree-awarding powers in Wales

In October 1892, Dawson was admitted to the University College of South Wales and Monmouthshire (the predecessor to Cardiff University) to study mathematics, chemistry, zoology and botany.

At that time, the College did not have degree-awarding powers, and students were prepared for University of London examinations. However, in 1893, whilst Dawson was a student, the history of Welsh education was altered irrevocably with the establishment of the University of Wales. The University Colleges in Cardiff, Bangor and Aberystwyth were its constituent institutions.

Academic excellence

Dawson was a high achiever from the outset: she won an exhibition (a bursary) at the College’s entrance examinations, which covered her matriculation and lecture fees, and another at the end of her first year.

She excelled in her scientific studies, winning prizes for her performance in all four of her subjects following her second year.

Chemistry Lab

From Botany modules to researching root nodules

After graduating with her B.Sc., Dawson was awarded a £150 research scholarship by Her Majesty’s Commission for the Exhibition of 1851. Her pioneering research, undertaken at the Cambridge Botanical Laboratories, investigated how the addition of nitrogen and nitrates to soil, a new practice at that time, affected crop yields.

In her research paper, ‘“Nitragin” and the nodules of leguminous plants’ published by Proceedings of the Royal Society of London, she concludes that adding nitrogen “to soils rich in nitrates” is inadvisable. Adding “a supply of it to soil poor in nitrates results in an increased yield”, however the best results are obtained when “nitrates [are] added to the soil”.

Aberdare Hall

Dawson may not have enrolled at the University of South Wales and Monmouthshire at all if it were not for the dedicated all-female hall of residence the College offered. Her family lived in London, too far to return home each day, and it was not considered respectable for a young, unmarried woman to live in lodgings unchaperoned.

Aberdare Hall, a Grade II-listed Gothic revival residence founded in 1885, was one of the first higher education halls for women to be founded in the UK, and remains an all-female residence and community to this day.

Aberdare Hall

Doff thy caps: the first degree ceremony of the University of Wales

The first degree ceremony of the University of Wales took place in Cardiff at Park Hall, a large concert hall, on 22 October 1897.

The magazine of the University College of South Wales and Monmouthshire, a student publication, reported on this auspicious occasion:

“The first to be presented was Miss Maria Dawson, for the degree of B.Sc., and her appearance was the signal for a great outburst of enthusiasm among the audience. The Deputy-Chancellor… gave her the diploma…, and with a… bow, she retired amid deafening cheers.”

College Magazine

Having our collections listed on Archives Hub makes them visible to a worldwide audience via Google. Since migrating our catalogues, we’ve received enquiries from as far afield as Hawaii, Hong Kong, and Sydney. Our collections hold a multitude of stories as inspirational as Maria Dawson’s, and thanks to the reach of Archives Hub, they can be discovered, remembered, and celebrated. We’re proud of our long history of supporting women’s research in science, technology, and medicine – you can find more stories of women innovating today here: Women in STEM at Cardiff University.

Alison Harvey, Archivist
Special Collections and Archives
Cardiff University / Prifysgol Caerdydd

Related

Records of the University College of South Wales and Monmouthshire and University College Cardiff, 1882-1988


Maria Dawson, first graduate of the University of Wales, 1897-1905 (description of photograph, part of the Cardiff University Photographic Archive collection, 1883-2001)

Browse all Cardiff University Archives / Prifysgol Caerdydd collections on the Archives Hub

All images copyright Cardiff University Archives / Prifysgol Caerdydd. Reproduced with the kind permission of the copyright holders.

Bringing Archives to Life – On Point: Royal Academy of Dance at 100

Archives Hub feature for December 2021

On Point: Royal Academy of Dance at 100 is a free display, mounted in collaboration with the Victoria and Albert Museum (V&A) in London, to celebrate the centenary of the Royal Academy of Dance (RAD) which was founded in 1920 with the aim of improving the standards of dance teaching in the UK. The display uses a wide range of material from both the RAD and the V&A archive collections, some of which are listed on the Archives Hub website, to explore the RAD’s story from its foundation to its influence on ballet and dance internationally.

On Point: Royal Academy of Dance at 100 in the Theatre and Performance Galleries at the V&A Museum, London. Photo © V&A
On Point: Royal Academy of Dance at 100 in the Theatre and Performance Galleries at the V&A Museum, London. Photo © V&A

The display occupies three rooms in the V&A’s Theatre and Performance galleries, and each space includes original costumes, designs, drawings, artefacts, and documents, as well as film footage and many photographic images. It’s a largely chronological arrangement with the first room focusing on the founders of the RAD, the context in which the organisation was founded, and its early development.  

In 1912, Philip Richardson (editor of the Dancing Times magazine) met the dancer, choreographer and teacher, Edouard Espinosa, at the Arabian Nights Ball in Covent Garden. The two men became friends and found common purpose in campaigning to improve the state of dance and dance teaching in in the UK. It was Richardson who essentially cherry-picked the five founders who agreed to form the first committee in 1920. Their international backgrounds represented the principal schools of ballet training (French, Italian, and Russian) and together they pooled their knowledge to produce a syllabus that would provide the foundation for a new British standard.

Adeline Genée, Phyllis Bedells, and Tamara Karsavina were among the greatest ballerinas of the early 20th century and committed to the RAD for the remainder of their lives. Lucia Cormani and Edouard Espinosa combined the roles of performers, choreographers, and teachers from early in their careers and were only involved with the RAD during its first decade. Their connections with the professional ballet scene were an important factor in shaping its work, its initial influence, and continuing development. Although the organisation was primarily concerned with teaching, the founders were also keen to promote the talents of young British dancers and provided many opportunities for performance.

Adeline Genée with young RAD scholars in 1932.
Adeline Genée with young RAD scholars in 1932.

Genée agreed to become the first President of the RAD and was instrumental in securing the patronage of Queen Mary in 1928 and the Royal Charter in 1935. Following the end of the Second World War, she turned her attention to getting ballet recognised as an educational subject to be taught in schools alongside the sister arts of music, drama, and painting. The second room explores the heart of the RAD’s business in teacher training and syllabus development more fully. We also introduce Margot Fonteyn who succeeded Adeline Genée as President of the RAD in 1954.

Costume design by Philip Prowse for Margot Fonteyn in Paquita, 1964.
Costume design by Philip Prowse for Margot Fonteyn in Paquita, 1964.
Margot Fonteyn and Rudolf Nureyev at a rehearsal for the 1963 Gala Matinée performance. Photo by GBL Wilson, © RAD/ArenaPAL
Margot Fonteyn and Rudolf Nureyev at a rehearsal for the 1963 Gala Matinée performance. Photo by GBL Wilson, © RAD/ArenaPAL

One of the highlights of Fonteyn’s presidency was the series of gala matinées she organised between 1958 and 1965. These performances showcased artists, companies and repertoire that had not been seen in London before, including the first appearance of Rudolf Nureyev in 1961. The galas proved to be an enormous success and provided the foundation for the legendary Fonteyn and Nureyev partnership. There were also opportunities for RAD scholars to perform in the programmes alongside the professional artists. The display includes a selection of materials relating to the galas – set and costume designs, photographs and programmes, alongside a beautiful costume from the romantic ballet Les Sylphides, which Fonteyn danced many times throughout her career. 

Another highlight in Room 2 is some previously unseen film footage of Fonteyn presenting the primary grade of the children’s syllabus which she devised in 1968. Filmed in 1972 by her brother Felix, it shows how involved she was with the work of the RAD, and was only recently discovered in the archives. 

The final room focuses on the current and future RAD with photographic representations of recent initiatives such as Silver Swans – dance classes for older learners of any ability, and Project B – a campaign aimed to encourage more boys into dance.  Well-established events such as the Genée International Ballet Competition (now renamed ‘The Fonteyn’) are also included here with the original Adeline Genée Gold Medal (first awarded in 1931) being displayed alongside more recent rehearsal footage and photographic images from across the years.

Madonna Benjamin, winner of the Adeline Genée gold medal in 1979. Photo by Jennie Walton.
Madonna Benjamin, winner of the Adeline Genée gold medal in 1979. Photo by Jennie Walton.

The presidents of the RAD are brought up to date with costumes worn by Antoinette Sibley (president from 1991 – 2012) and Darcey Bussell (president from 2012 to current) displayed alongside a tunic worn by Nureyev as Prince Siegfried in Act 3 of Swan Lake. The succession of legendary ballerinas who have assumed the role of president shows the strong connection that has always existed between the RAD and the ballet profession.

Shoes worn by Darcey Bussell at her farewell performance with the Royal Ballet, 8 June 2007. Photo © V&A
Shoes worn by Darcey Bussell at her farewell performance with the Royal Ballet, 8 June 2007. Photo © V&A

Visitors to the display are also encouraged to have a go for themselves! A ballet barre area has been installed with screens showing some simple exercises from the current RAD Graded Examinations syllabus to follow along.

100 years later, the RAD is now a truly global organisation, inspiring people and communities everywhere to enjoy the benefits and joys of dance – something of which its founders would rightly be proud.

On Point: Royal Academy of Dance at 100 is on now until Monday 29th August 2022 at the V&A Museum, London (admission free).

Eleanor Fitzpatrick
Archives and Records Manager
Royal Academy of Dance

Related

Adeline Genée Archive Collection, c. 1890-1970

Phyllis Bedells Archive Collection, c. 1906-1985

Browse all Royal Academy of Dance Archives on the Archives Hub

Browse all V&A Theatre and Performance Collections on the Archives Hub

Previous features on Royal Academy of Dance Archives collections

The Association of Performing Arts Collections

A Spring in Your Step

All images copyright. Reproduced with the kind permission of the copyright holders.

Paul Oliver Archive of African American Music

Archives Hub feature for November 2021

The flourishing of the commercial music industry in early twentieth-century America enabled people thousands of miles away in Europe to hear the new and previously unimagined sounds of jazz and blues. Carried over the Atlantic in the form of 78 rpm shellac records – many of them brought by US servicemen during the Second World War – they became an object of obsession for collectors, some of whom sought to learn more about the lives behind the names on the disc labels. One such collector was Paul Oliver (1927-2017), who would go on to become one of the foremost authorities on the history of blues music, publishing such books as Blues Fell This Morning (1960) and The Story of the Blues (1969).

Paul Oliver (second from right) with (L-R) Little Walter, Sunnyland Slim, Roosevelt Sykes, Armand “Jump” Jackson, and Little Brother Montgomery.

As a white Englishman, he was, as he wrote, ‘acutely aware of my remoteness from the environment that nurtured the blues’, but he made it his mission to try and understand that environment, encouraged early on by meetings in Paris with the black American writer Richard Wright (who wrote a foreword to Blues Fell This Morning). Oliver did not actually set foot in America until 1960, when with the aid of a US embassy grant and BBC sound equipment, he managed to interview some 70 blues musicians and associated figures, whose transcribed voices would form the basis of the documentary book Conversation with the Blues (1965).

Letter to Oliver from Richard Wright.

The original tapes of those interviews, along with correspondence with Wright, now form part of the Paul Oliver Archive of African-American Music, based in the library of Oxford Brookes University (where Oliver taught architecture for many years). The collection is in the process of being catalogued with the support of the European Blues Association and an Archives Revealed cataloguing grant. The interviews – and the bulk of Oliver’s papers – have already been catalogued, but there are over a hundred other digitised audio tapes still to go. Most of these are compilations of obscure blues songs dubbed from 78s in the early 1960s; though nowadays such material can be accessed via streaming services (thanks to reissue labels such as Document and Yazoo), the original tracklists help situate Oliver in a network of collectors engaged in intensive discographical research.

Lightnin’ Hopkins at the Sputnik Bar, Houston.

There is a tendency now to view blues retrospectively through the prism of its influence on rock music, something Oliver in his later years remained unhappy about: ‘the perception of Robert Johnson as being the grandfather of rock, has led to a peculiar kind of history… which channels everything from Mississippi through a very narrow group of people’. Oliver was drawn to more overlooked performers, admitting to an initial bias towards those with distinctive nicknames: ‘Lightnin’ [Hopkins] or Peetie Wheatstraw were not the names you’d normally come across, so to speak, where a name like Tommy Johnson or Robert Johnson would just sound like the guy next door’. Ironically, this eye for names led to Oliver playing an indirect role in rock history, as it was his allusion in a set of liner notes to the little-known bluesmen Pink Anderson and Floyd Council that reputedly gave the young Syd Barrett the idea to name his band Pink Floyd.

Postcards of views of Mississippi and Arkansas.

The world Oliver inhabited was still one of paper, analogue media, and a dependence on the postal system. For over a decade he worked on a long-distance project about Texas blues with the eccentric American folklorist Mack McCormick, who sent him tapes of gospel services and Mexican Tejano music recorded from Houston radio, turning up so much material that a comprehensive account remained forever out of reach. A desire to trace the roots of the blues to Africa also led Oliver on a field trip to Ghana, where he made several recordings in 1964. These tapes now sit alongside boxes of handwritten lyric transcriptions, typewritten discographies, research cuttings, and visual memorabilia, all testament to a lifetime spent attempting to understand ‘the relationship between the music, the song and the community’.

Tape boxes containing Ghana field recordings.

A selection of Oliver’s photographs from the 1960 US trip – along with audio clips from some of the interviews – can now be viewed in an online exhibition hosted by Oxford Brookes Special Collections. Lower-level catalogue descriptions will be added to the Archives Hub as the project progresses; a collection-level record can be accessed here.

References

David Horn, Interview with Paul Oliver (2007)
Christian O’Connell, Interview with Paul Oliver (2009)
Paul Oliver, ‘Author’s note’ to Blues Fell This Morning (1960)

Fabian Macpherson
Blues Off the Record Project Cataloguer
Oxford Brookes University Special Collections and Archives

Related

Paul Oliver Archive of African American Music (19th-20th centuries) collection description

Browse all Oxford Brookes University Special Collections and Archives descriptions on the Archives Hub.

All images copyright Oxford Brookes University Special Collections and Archives. Reproduced with the kind permission of the copyright holders.

The Archives Hub and IIIF: supporting the true potential of images on the Web

IIIF is a model for presenting and annotating digital content on the Web, including images and audio/visual files. There is a very active global community that develops IIIF and promotes the principles of open, shareable content. One of the strengths of IIIF is the community, which is a diverse mix of people, including developers and information professionals.

IIIF map showing where there are known IIIF projects and implementations

Images are fundamental carriers of information. They provide a huge amount of value for researchers, helping us understand history and culture. We interact with huge amounts of images, and yet we do not always get as much value out of them as we might. Content may be digitised, but it is often within silos, where the end user has to go to a specific website to discover content and to view a specific image, it is not always easy or possible to discover, gather together, compare, analyse and manipulate images.

IIIF is a particularly useful solution for cultural heritage, where analysis of images is so important. A current ‘Towards a National Collection’ project has been looking at practical applications of IIIF.

The IIIF Solution

Exactly what IIIF enables depends upon a number of factors, but in general it enables:

Deep zoom: view and zoom in closely to see all the detail of an image

Sequencing: navigate through a book or sequence of archival materials

Comparisons: bring images together and put them side-by-side. This can enable researchers to bring together images from different collections, maybe material with the same provenance that has been separated over time.

Search within text: work with transcriptions and translations

Connections: connect to resources such as Wikidata

Use of different IIIF viewers: different viewers have their own features and facilities.

How It Works

The IIIF community tends to talk in terms of APIs. These can be thought of as agreed and structured ways to connect systems. If you have this kind of agreement then you can implement different systems, or parts of systems, to work with the same content, because you are sticking to an agreed structure. The basic principle is to store an image once (on a IIIF server) and be able to use it many times in many contexts.

IIIF is like a a layer above the data stores that host content. The images are accessed through that IIIF layer – or through the IIIF APIs. This enables different agents to create viewers and tools for the data held in all the stores.

Different repositories have their own data stores, but they can share content through the IIIF APIs.

There are a few different APIs that make up the IIIF standard.

Image API

This API delivers the content (or pixels). The image is delivered as a URL, and the URL is structured in an agreed way.

Presentation API

This delivers information on the presentation of the material, such as the sequence of a book, for example, or a bundle of letters, and metadata about the object.

This screenshot shows the Image API providing the zoomable image, and the presentation API providing basic information – the title and the sequence of the pages of this object.

Search API

Allows searching within the text of an object.

Authentication API

Allows materials to be restricted by audience. So, this is useful for sensitive images or images under copyright that may have restrictions.

IIIF viewer

As IIIF images are served in a standard way, any IIIF viewer can access them. Examples of IIIF viewers:

The Universal Viewer: https://universalviewer.io/
Mirador: https://mirador-dev.netlify.app/tests/integration/mirador/
Archival IIIF: https://archival-iiif.github.io/
Storiiies digital storytelling: https://storiiies.cogapp.com/#storiiies

There are a whole host of viewers available, with various functionality. Most will offer the basics of zooming and cropping. There does seem to be a question around why so many viewers are needed. It might be considered a better approach for the community to work on a limited group of viewers, but this may be a politically driven desire to own and brand a viewer. In the end, a IIIF viewer can display any IIIF content, and each viewer will have its own features and functionality.

To find out more about how researchers can benefit from IIIF, you may like to watch this presentation on YouTube (59m): Using IIIF for research 

Some Examples

In many projects, the aim is to digitise key materials, such as artworks of national importance and rare books and manuscripts, in order to provide a rich experience for end users. For instance, the Raphael Cartoons at the V&A are now available to explore different layers and detail, even enabling the infra-red view and surface view, to allow researchers to study the paintings in great depth. Images can easily be compared within your own workspace, by pulling in other IIIF images.

The V&A Raphael Cartoons can be viewed in ultra high resolution colour, exploring all of the layers

What is the Archives Hub planning to do with IIIF?

Hosting content: We are starting a 15 month project to explore options for hosting and delivering content. Integral to this project will be providing a IIIF Image API. As referenced above, this will mean that the digital content can be viewed in any IIIF viewer, because we will provide the necessary URLs to do so. One of the barriers for many archives is that images need to be on a IIIF server in order to utilise the Image API. It may be that Jisc can provide this service.

Creation of IIIF manifests: We’ll talk more about this in future blog posts, but the manifest is a part of the Presentation API. It contains a sequence (e.g. ordering of a book), as well as metadata such as a title, description, attribution, rights information, table of contents, and any other information about the objects that may be useful for presentation. We will be looking at how to create manifests efficiently and at scale, and the implications for representing hierarchical collections.

Providing an interface to manage content: This would be useful for any image store, so it does not relate specifically to IIIF. But it may have implications around the metadata provided and what we might put into a IIIF manifest.

Integrating a IIIF viewer into the Archives Hub: We will be providing a IIIF viewer so that the images that we host, and other IIIF images, can be viewed within the Archives Hub.

Assessing image quality: A key aim of this project is to assess the real-world situation of a typical archive repository in the UK, and how they can best engage with IIIF. Image resolution is one potential issue. Whilst any image can be served through the IIIF API, a lower resolution image will not give the end user the same sort of rich experience with zooming and analysing that a high resolution image provides. We will be considering the implications of the likely mix of different resolutions that many repositories will hold.

Looking at rights and IIIF: Rights are an important issue with archives, and we will be considering how to work with images at scale and ensure rights are respected.

Projects often have a finite goal of providing some kind of demonstrator showing what is possible, and they often pre-select material to work with. We are taking a different approach. We are working with a limited number of institutions, but we have not pre-selected ‘good’ material. We are simply going to try things out and see what works and what doesn’t, what the barriers are and how to overcome them. The process of ingest of the descriptive data and images will be part of the project. We are looking to consider both scalability and sustainability for the UK archive sector, including all different kinds of repositories with different resourcing and expertise, and with a whole variety of content and granularity of metadata.

Acknowledgement: This blog post cites the introductory video on IIIF which can be viewed within YouTube.