Employing Machine Learning and Artificial Intelligence in Cultural Institutions

As mentioned in my last post, we’re looking at the possibilities Artificial Intelligence and Machine Learning can offer the Archives Hub and the archives community in general. I also now have a wider role in Jisc as a ‘Technical Innovations Manager’, so my brief is to consider the wider technical and strategic possibilities of AI/ML for the Digital Resources directorate and Jisc as a whole. We continue to work behind the scenes, but we also keep a watch on cultural heritage and wider sector activities. As part of this I participated in the Aeolian Project’s ‘Online Workshop 1: Employing Machine Learning and Artificial Intelligence in Cultural Institutions’ yesterday.

‘Visual AI and Printed Chapbook Illustrations at the National Library of Scotland’ – Dr Giles Bergel (University of Oxford / National Library of Scotland)

Giles’ team have been using machine learning (ML) on data from data.nls.uk. He outlined their three part approach. First they find illustrations in manuscripts using Google’s EfficientDet object detection convolutional neural network seeded by manually pre-annotated images. They found the object detector worked extremely well after relatively few learning passes. There were a few false positives such as image ink showing through, marginalia and dog ears that would confuse the model.

Image showing false postive ml recognition
False positive ML recognition – ink showthrough

Next they matched and grouped the illustrations using their “state of art” image search engine. Giles believes this shows that AI simplifies the task of finding things that are related in images. The final step was to apply classification alogorithms with the VGG Image Classification Engine which uses Google as a source of labelled images. The lessons learned were:

  • AI requires well-curated data
  • Tools for annotating data are no less important than classifiers
  • Generic image models generalize well to printed books
  • ‘Classical’ computer vision still works
  • AI software development benefits from end-to-end use-cases including data preparation, refinement, consulting with domain experts, public engagement etc.

Machine Learning and Cultural Heritage: What Is It Good Enough For?’ – John Stack (UK Science Museum)

John described how AI is being used as part of the Science Museum’s linked data work to collect data into a central knowledge graph. He noted that the Science Museum are doing a great deal of digitisation but currently they only have what John describes as ‘thin’ object data.

They are looking at using AI for name disambiguation as a first step before adding links to wikidata and using entity recognition to enhance their own catalogue. It stuck me that they, and we at the Hub, have been ‘doing AI’ for a while now with such technologies as entity recognition and OCR before the term AI was used. They are aiming to link through to wikidata such that they can pull in the data and add it to their knowledge graph. This allows them to enhance their local data and apply ML to perform such things as clustering to draw out new insights.

John identified the main benefits of ML currently as suggesting possibilities and identifying trends and gaps. It’s also useful for visualisation and identifying related content as well as enhancing catalogues with new terminology. However there were ‘but’s. ML content needs framing and context. He noted that false positives are not always apparent and usually require specialist knowledge. It’s important to approach things critically and understand what can’t be done. John mentioned that they don’t have any ML driven features in production as yet.

Diagram showing the components of the Heritage Connector software

This was followed by a Q&A where several issues came up. We need to consider how AI may drive new ways/modalities of browsing that we haven’t imagined yet. A major issue is the work needed to feed AI enhancements into user interfaces. Most work so far has been on backend data. AI tools need to integrate into day-to-day workflows for their benefits to be realised. More sector specific case-studies, training materials, tools and models are needed that are appropriate to cultural heritage. See the Heritage Connector blog for more information.

AI and the Photoarchive‘ – John McQuaid (Frick Collection), Dr Vardan Papyan (University of Toronto), and X.Y. Han (Cornell University)

The Frick Collection have been using the PyTorch deep neural network to identify labels for their photo archive collection. They then compared the ML results as a validation exercise with internally crowdsourced data from their staff and curators captured by the Zooinverse software for the same photos.

Frick Collection workflow
Frick Collection ML workflow

They found that 67% of the ML labels matched with the crowdsource validations which they considered a good result. They concluded that at present ML is most useful for ‘curatorial amplification’, but much human effort is still needed. This auto-generation of metadata was their main use case so far.

Keep True: Three Strategies to Guide AI Engagement‘ – Thomas Padilla (Center for Research Libraries)

Thomas believes GLAMs have an opportunity to distinguish themselves in the AI space. He covered a number of themes, the first being the ’Non-scalability imperative’. Scale is everywhere with AI.  There’s a great deal of marketing language about scale, but we need to look at all the non-scalable processes that scale depends on. There’s a problematic dependency where scalability is made possible by non-scalable processes, resources and people. Heterogeneity and diversity can become a problem to be solved by ML. There’s little consideration that AI should be just and fair. 

The second theme was ‘Neoliberal traps’ in AI. Who says ethical AI is ethical AI? GLAMs are trying to do the right thing with AI, but this is in the context of neoliberal moral regulation which is unfair and ineffective. He mentioned some of the good examples from the sector including from CILIP, Museums AI Network and his own ‘Responsible Operations‘ paper.

He credited Melissa Terras for asking the question “How are you going to advocate for this with legislation?”. The US doesn’t have any regulations at the moment to get the private sector to get better. I mentioned the UK AI Council who are looking at this in the UK context, and the recent CogX event where the need for AI regulation was discussed in many of the sessions.

The final theme was ‘Maintenance as Innovation’. Information maintenance is a Practice of Care. There is an asserted dichotomy between maintenance and innovation that’s false. Maintenance is sustained innovation and we must value the importance of maintenance to innovation. He appealed to the origin of the word ‘innovation’ which derives from the latin ‘innovare’ which means “to alter, renew, restore, return to a thing, introduce changes in the way something is done or made”. It’s not about creating from new. At the Hub we wholeheartedly endorse this view. We feel there’s far too much focus on the latest technology meme and we’ve had tensions within our own organisation along these lines. There may appear to be some irony here given the topic of this post, but we have been doing AI for a while as noted above. He referred us to https://themaintainers.org/ for more on this.

Roundtable discussion with the AEOLIAN Project Team

Dr Lise Jaillant, Dr Annalina Caputo, Glen Worthey (University of Illinois), Prof. Claire Warwick (Durham University), Prof. J. Stephen Downie (University of Illinois), Dr Paul Gooding (Glasgow University), and Ryan Dubnicek (University of Illinois).

Stephen Downie talked about the need for standardisation of ML extracted features so we can re-use these across GLAMs in a consistent way. The ‘Datasheets for Datasets’ paper was mentioned that proposes “a short document to accompany public datasets, commercial APIs, and pretrained models”. This reminded me of Yves Bernaert’s talk about the related need for standardisation of carbon consumption measures. Both are critical issues and possible areas for Jisc to be involved in providing leadership. Another point that Stephen made is that researchers are finding they can’t afford the bill for ML processing. Finding hardware and resources is a big problem. As noted by ML guru Andrew Ng, we have a considerable data issue with AI and ML work . It may be that we need to work more on the data rather than wasting time, electricity and money re-creating expensive ML models. A related piece of work, ‘Lessons from Archives‘ was also mentioned in this regard. There is a case for sharing model developments across the sector for efficiency and sustainability here.

Five Hundred Years of the WS Society Archive, Edinburgh

Archives Hub feature for July 2021

WS Society deed box.
WS Society deed box.

The creation of the WS Society’s archive catalogue on the Archives Hub  during the first period of national lockdown was the completion of stage two of a long-term project to remap and rehouse the records of Scotland’s oldest and largest body of solicitors. The Society of Writers to HM Signet, to give it its full name, has its origins in a fraternity of legal clerks working for the king’s secretary in medieval Scotland, and it was incorporated into the College of Justice by James V in 1532. The Society underwent reforms in 1594 and the minute book that was opened in that year is the oldest single item in the archive. Over the centuries, the Society’s lawyers have played a key role in the history of Scotland, but they were also central to the country’s intellectual and cultural development. Our archive is a map of the Society’s history – the lives and careers of generations of Scottish professionals – but it is also a map of its greatest cultural creation, the Society’s home and headquarters, the Signet Library in Edinburgh.

The archive now has its own specially adapted space within the Library, and all bound records are now recorded in the catalogue. The archive’s five centuries of unbound records have been surveyed and the task of adding them to the catalogue will be the third and final stage of the project. The Victorian deed boxes that once housed these records may be obsolete but they are also beautiful, and some are now on display in public parts of the building.

The WS Society’s archive is open to academics and post-graduate students, and to bone fide independent researchers at the discretion of the WS Society’s officers. For more details, please contact the Research Principal James Hamilton on library@wssociety.co.uk.

The Signet Library in 1867 - glass slide - George Washingon Wilson.
The Signet Library in 1867 – glass slide – George Washingon Wilson.

The Signet Library is both a building and a collection. It is one of the largest private libraries in the United Kingdom and the books, art, furniture, artifacts and ephemera that it contains are a direct and important product of and survivor of the Scottish Enlightenment. Its scholar-librarians, who include the antiquarian David Laing and the church historian Thomas Graves Law, played vital roles in the development of Scottish historiography. The Library has its own record series within the archive, mapping the growth of a major Scottish intellectual institution from the first book purchases in 1722, through the golden age of the Scottish Enlightenment to the present day.

John Watson's institution entry form for Euphemia Bridie 1837.
John Watson’s institution entry form for Euphemia Bridie 1837.

The WS Society is a registered charity and the solicitors who are the Society’s members have always played a key role in the Scottish charitable world. In the archive are records of people who have otherwise entirely vanished from the historical record, preserved in the treasurer’s records of charitable giving. The Society ran a school for orphans in Edinburgh for 200 years – John Watson’s Institution – and early records of those entering the school are here, with evidence of family and support networks and with the (often tragic) stories that led them to the Institution’s door. A host of charities providing hospitals, homes, food and education to the less fortunate were administered by WS Society lawyers and the archive contains extensive records of these involvements.

Letters of Madeleine Smith, accused of the Sandyford murder in Glasgow, from the Working Library of William Roughead WS letters.
Letters of Madeleine Smith, accused of the Sandyford murder in Glasgow, from the working library of William Roughead WS.

In a building that contains both library and archive there will be materials that could be defined as belonging to either or both. One instance of this is found in the working library of the author William Roughead, Writer to the Signet and Scotland’s famous historian of true crime, which was deposited with us in 1952 along with a host of correspondence and papers. This collection is now the most heavily used part of the library and archive, and is in constant demand from journalists, television producers and academics.

Session Papers, including the trial of Deacon Brodie 1788 and the Joseph Knight slavery trial 1772
Session Papers, including the trial of Deacon Brodie 1788 and the Joseph Knight slavery trial 1772.

Another jewel in our crown is our collection of Scottish Session Papers, printed materials used in civil court cases from 1711 onwards. All Scottish life is here – from arguments over the contents of window boxes to the records of the cases that finally ended slavery north of the Tweed – and amongst the bald legal texts can be found fascinating annotations by the lawyers themselves, beautiful hand-drawn maps, letters, and, later, photographs. These papers are the greatest untapped historical resource in Scotland and a collaborative effort to digitise the various collections of Session Papers is ongoing. Our collection was indexed during the Great War and the index has been placed online.

Bar bill from a WS Society dinner in Ediburgh, 1722.
Bar bill from a WS Society dinner in Ediburgh, 1722.

The WS Society has always had within it a strong social and artistic life, and the archive reflects this with records of the Society’s rifle club and militia at the one extreme (created in response to a French invasion threat in the 1850s) to records of dining societies, sporting and golf clubs (Scotland’s greatest all-round sportsman, Leslie-Balfour Melville, was a Writer to the Signet), and more recently records pertaining to the Edinburgh Festivals where the Society has provided both administrative support and a venue.

John Jardine’s list of the women of Edinburgh 1746.

Not all of the records that the Society possesses are currently held at the Signet Library. The great collection of papers about the 1745 Jacobite Rebellion built up by the Reverend John Jardine (1716-1766) are now mostly on deposit at the National Records of Scotland, although we still hold Jardine’s astonishingly indiscreet lists of the women in Edinburgh during the ’45 and these have just been edited and published by the Scottish Record Society.  Recent years have seen a long overdue recognition of the importance of business records and of the records of legal firms. In Scotland, a single law firm might serve a community for generations, and its records if properly preserved offer a unique and important window on the lives of everyone within it. But a new wave of mergers and takeovers of legal firms, along with the demise of some ancient firms with the death or retirement of the final remaining partners, has put these vital records under threat. The WS Society and Signet Library stand ready to provide advice on the preservation of such records, connections with bodies with specific expertise in the managing of such archives, and, if necessary, will provide law firm records with a permanent home.

James Hamilton
Research Principal
Society of Writers to HM Signet, Edinburgh

Related

Archive of the Society of Writers to HM Signet, Edinburgh (1594 – ongoing)

All images copyright the Society of Writers to HM Signet, Edinburgh. Reproduced with the kind permission of the copyright holders.

Artificial Intelligence – Getting the Next Ten Years Right

CogX poster with dates of the event

I attended the ‘CogX Global Leadership Summit and Festival of AI’ last week, my first ‘in-person’ event in quite a while. The CogX Festival “gathers the brightest minds in business, government and technology to celebrate innovation, discuss global topics and share the latest trends shaping the defining decade ahead”. Although the event wasn’t orientated towards archives or cultural heritage specifically, we are doing work behind the scenes on AI and machine learning with the Archives Hub that we’ll say more about in due course. Most of what’s described below is relevant to all sectors as AI is a very generalised technology in its application.

image of presenter

My attention was drawn to the event by my niece Laura Stevenson who works at Faculty and was presenting on ‘How the NHS is using AI to predict demand for services‘. Laura has led on Faculty’s AI driven ‘Early Warning System’ that forecasts covid patient admissions and bed usage for the NHS. The system can use data from one trust to help forecast care for a trust in another area, and can help with best and worst scenario planning with 95% confidence. It also incorporates expert knowledge into the modelling to forecast upticks more accurately than doubling rates can. Laura noted that embedding such a system into operational workflows is a considerable extra challenge to developing the technology.

Screenshot of Explainability Data
Example of AI explainability data from the Early Warning System (image ©Faculty.ai )

The system includes an explainability feature showing various inputs and the degree to which they affect forecasting. To help users trust the tool, the interface has a model performance tab so users can see information on how accurate the tool has been with previous forecasts. The tool is continuing to help NHS operational managers make planning decisions with confidence and is expected to have lasting impact on NHS decision making.

image of presenter

Responsible leadership: The risks and the rewards of advancing the state of the art in AI’ – Lila Ibrahim

Lila works at Deep Mind who are looking to use AI to unlock whole new areas of science. Lila highlighted the role of the AI Council who are providing guidance to UK Government in regard to UK AI research. She talked about Alphafold that has been addressing the 50 year old challenge of protein folding. This is a critical issue as being able to predict protein folding unlocks many possibilities including disease control and using enzymes to break down industrial waste. DeepMind have already created an AI system that can help predict how a protein folding occurs and have a peer reviewed article coming out soon. They are trying to get closer to the great challenge of general intelligence.

image of presenter

Sustainable Technologies, Green IT & Cloud‘ – Yves Bernaert, Senior Managing Director, Accenture

Yves focussed on company and corporate responsibility, starting his session with some striking statistics:

  • 100 companies produce 70% of global carbon emissions.
  • 40% of water consumption is by companies.
  • 40% of deforestation is by companies.
  • There is 80 times more industrial waste than consumer waste.
  • 20% of the acidification of the ocean is produced by 20 companies only.

Yves therefore believes that companies have a great responsibility, and technology can help to reduce climate impact. 2% of global electricity comes from data centres currently and is growing exponentially, soon to be 8%. A single email produces on average 4g of carbon. Yves stressed that all companies have to accept that now is the time to come up with solutions and companies must urgently get on with solving this problem. IT energy consumption needs to be seen as something to be fixed. If we use IT more efficiently, emissions can be reduced by 20-30%. The solution starts with measurement which must be built into the IT design process.

We can also design software to be far more efficient. Yves gave the example of AI model accuracy.  More accuracy requires more energy. If 96% accuracy is to be improved by just 2%, the cost will be 7 times more energy usage. To train a single neural network requires the equivalent of the full lifecycle energy consumption of five cars. These are massive considerations. Interpreted program code has much higher energy use than compiled code such as C++.

A positive note is that 80% of the global IT workload is expected to move to the cloud in the next 3 years. This will reduce carbon emissions by 84%. Savings can be made with cloud efficiency measures such as scaling systems down and outwards so as not to unneccessarily provision for occasional workload spikes. Cloud migration can save 60 million tons carbon per year which is the equivalent of 20 million full lifecycle car emissions. We have to make this happen!

On where are the big wins, Yves said this is also in the IT area. Companies need to embed sustainability into their goals and strategy. We should go straight for the biggest spend. Make measurements and make changes that will have the most effect. Allow departments and people to know their carbon footprint.

* Update 28th June 2021 * – It was remiss of me not to mention that I’m working on a number of initiatives relating to green sustainable computing at Jisc. We’re looking at assessing the carbon footprint of the Archives Hub using the Cloud Carbon Footprint tool to help us make optimisations. I’m also leading on efforts within my directorate, Digital Resources, to optimise our overall cloud infrastructure using some of the measures mentioned above in conjuction with the Jisc Cloud Solutions team and our General Infrastructure team. Our Cloud CTO Andy Powell says more on this in his ‘AI, cloud and the environment‘ blog post.

image of presenter

Future of Research’ – Prof. Dame Ottoline Leyser, CEO, UK Research and Innovation (UKRI)

Ottoline believes that pushing the boundaries of how we support research needs to happen. Research is now more holistic. We draw in what we need to create value. The lone genius is a big problem for research culture and it has to go. Research is insecure and needs connectivity.

Ottoline believes AI will change everything about how research is done. It’s initially replacing mundane tasks but will some more complex tasks such as spotting correlations. Eventually AI will be used as a tool to help understanding in a fundamental way. In terms of the existential risk of AI, we need to embed research as collective endeavour and share effort to mitigate and distribute this risk. It requires culture change, joining up education and entrepreneurship.

We need to fund research in places that are not the usual places. Ottoline likes a football analogy where people are excited and engaged at all levels of the endeavour, whether in the local park or at the stadium. She suggests research at the moment is more like elitist Polo not football.

Ottoline mentioned that UKRI funding does allow for white spaces research. Anyone can apply. However, we need to create wider white spaces to allow research in areas not covered by the usual research categories. It will involve braided and micro careers, not just research careers. Funding is needed to support radical transitions. Ottonline agrees that the slow pace of publication and peer review is a big problem that undermines research. We need to broaden ways we evaluate research. Peer Review is helpful but mustn’t slow things down.

image of presenter

Ethics and Bias in AI‘ – Rob Glaser, CEO & Founder of RealNetworks

Rob suggests we are in an era with AI where there are no clear rules of the road yet. The task for AI is to make it safe to ‘drive’ with regulations. We can’t stop facial recognition any more than we can stop gravity. We need datasets for governance so we can check accuracy against these for validation. Transparency is also required so we can validate algorithms.  A big AI concern is the tribalism on social media.

image of presenter

‘AI and Healthcare‘ – Rt. Hon. Matt Hancock

Matt Hancock believes we are at a key moment with healthcare and AI technology where it’s now of vital importance. Data saves lives! The next thing is how to take things forward in NHS. A clinical trials interoperability programme is starting that will agreed standards to get more out of data use, and the Government will be updating it’s Data strategy soon. He suggests we need to remove silos and commercial incentives (sic). On the use of GP data he suggests we all agree on the use of data, but the question is how it’s used. The NHS technical architecture needs to improve for better use and building data into the way the NHS works. GPs don’t own patient data, it is the citizen.

He said a data lake is being built across the NHS. Citizen interaction with health data is now greater than ever before and NHS data presents a great opportunity for research, and an enormous opportunity for the use of data to advance health care. He suggested we need to radically simplify the NHS information governance rules. On areas where not enough progress has been made, he mentioned the lack of separation of data layers is currently a problem. So many applications silo their data. There has also been a culture of Individual data with personal curation. The UK is going for a TRE first approach: ‘Trusted Research Environment service for England‘. Data is the preserve of the patient who will allow accredited researchers to use the data through the TRE. The clear preference of citizens is sharing data if they trust the sharing mechanism. Every person goes through a consent process for all data sharing. Acceptance requires motivating people with the lifesaving element of research. If there’s trust, the public will be on side. Researchers in this domain with have to abide by new rules to allow us to build on this data. He mentioned that Ben Goldacre will look at the line where open commons ends and NHS data ownership begins in the forthcoming Goldacre Review.

User Experiences of Archive Catalogues and Use of Primary Sources

On 19 June we ran a webinar on user research and user behaviour. We had three speakers – David Marshall, a UX Researcher from the University of Cambridge, Kelly Arnstein, a UX Specialist from the University of Glasgow, and Deborah Wilson, a Subject Librarian from Queens University Belfast.

Link to view the Zoom recording of the session – please use the passcode : m^9xj.vt https://jisc.zoom.us/rec/share/T1HJWEHzO5jvLEoJEEjzm2ch9DhlHKiGUQGEQSzrt-jhQ6DzFUEKvyBpWuOTa-Xv.IKKYEwWG9fT5-lup

(main talks 1hr + 25 minute discussion). Slides are also provided as links (below).

The talks were excellent, and followed by a lively discussion. They should prove to be useful to anyone looking at designing a website for archive catalogues, and working with students using primary sources. Overall, there was a lot of consensus about user behaviour, which is useful in terms of sharing findings – because it is likely to be relevant to all archives. The emphasis for this session was on students and academic researchers, but we did discuss some of the challenges of meeting the needs of a diverse audience.

A few summary points that came out of one or more of the talks:

  • People may use an archive catalogue for research and also for teaching, scoping a project, marketing and other reasons.
  • Researchers want comprehensive detailed descriptions
  • They value name of creator
  • They want an idea of the physicality of the collection and the overall size
  • People want context and hierarchy, and like the idea of ‘leafing through’ material to see relationships.
  • There are those who want to get quickly to what they need and those who value browse and serendipity. This seems like a possible tension, and certainly a challenge, in terms of interface design. It may be that at different times the same researcher wants a quick route through and other times they want to take time and discover.
  • Cambridge research found that some users wanted to limit their search by date initially, but there was a strong feeling that a wide search and then filtering was generally a good option.
  • Finding everything of value was seen as key – many researchers were prepared to spend time to discover materials related to their research and worried about missing important materials.
  • The physical object remains key to many researchers
  • Saving searches and other forms of personalisation were seen as a good thing
  • Quite often researchers, especially if they are more experienced, understand that research skills are important and archive catalogues are complex; this may contrast with library databases, where they are more inclined to want to get to things quickly.
  • Undergraduates often don’t understand the different approach needed to engage with primary sources
  • Undergrads often engage with archives at the point of an assignment, where they are being marked on their use of primary sources; they initially try to find sources in the same way as they would search for anything else.
  • It is really valuable to educate students on the importance of context, the broad search and filter approach, understanding citations, evaluating databases, etc. They often don’t really know what primary sources are and can find them off-putting.
  • Researchers can make assumptions about what a repository holds, and then be surprised to find that there is material that is relevant for them.
  • A bad catalogue can put a researcher off, and they may choose to go further afield if the catalogue offers a better experience.
  • People often ignore tooltips. It is a challenge to provide help that people use.

David’s Slides: https://archiveshub.jisc.ac.uk/documents/user-research-dm.pptx

Kelly’s Slides: https://archiveshub.jisc.ac.uk/documents/user-research-ka.pptx

Deborah’s Slides: https://archiveshub.jisc.ac.uk/documents/user-research-dw.pptx

Archive Collections in the North – Global Change

Archives Hub feature for June 2021

June’s Archives Hub feature is the result of animated discussions between members of Academic Libraries North (formerly Northern Collaboration) Special Interest Group for Special Collections and Archives. We chose Global Change as an overarching idea and asked group members to pick a collection that spoke to this theme. Far from being a random assortment of disparate collections with no common ground, the resulting list revealed linked collections with great research potential for those interested in political history, social history, activism, immigration and emigration, technological and design innovation – and even railway engineering.

Drink driving awareness lantern slide: Abstainers have the best of it. Courtesy of the Livesey Collection, UCLan Special collections & Archives.

University of Bradford – Peace Pamphlet Collection

This collection comprises thousands of peace pamphlets gathered by Commonweal Library from their rich network of connections in protest campaigns worldwide. They present an incredible resource for researchers and illustrate the ideas and activities of British peace movements from the First World War to the present day. Significant publishers include the Peace Pledge Union, Campaign for Nuclear Disarmament and the Fellowship of Reconciliation. They also offer a fascinating visual record, with many well-known artists contributing designs.

Pamphlet cover from Daily Mirror spotlight on the Common Market, c1960
Pamphlet collection, Special Collections, University of Bradford
.

Durham University – Malcolm MacDonald Papers

Son of Ramsay MacDonald, Malcolm MacDonald was elected Labour MP for Bassetlaw 1929. He held the seat until 1935, and was National Labour MP for Ross and Cromarty 1936-1945. He held ministerial office in the Dominions & Colonial Office 1931-1940, and was British High Commissioner to Canada, 1941-1946. He was Governor General of Malaya, his responsibilities subsequently extended to cover all S.E. Asia. In 1955 he became High Commissioner for the U.K. in India and in 1960 was appointed co-chairman of the international conference on Laos. The final part of his administrative and diplomatic career was spent in Africa as Governor and Commander in Chief and later High Commissioner for Kenya 1963-4.

Lancaster University – Socialist Pamphlets

A significant item in this collection is ABC of votes for Women by Marion Holmes (nee Miller) 1867-1943, printed in 1913. Marion was a suffragette, a freelance journalist and writer. She was on the committee for the Society of Women Journalists and established Margate Pioneer Society.  In Croydon she was the President of the local Women’s Social and Political Union and a member of the Women’s Freedom League and the first female election agent in Keighley. This work covers the importance of women having the ability to vote.

https://lancaster.alma.exlibrisgroup.com/view/UniversalViewer/44LAN_INST/12159677070001221#?c=0&m=0&s=0&cv=0&xywh=-1271%2C-92%2C3602%2C1820

University of Leeds – Leeds Russian Archive

The Leeds Russian Archive, established in 1982, comprises around 650 collections of manuscripts, photographs and other archival material related to Anglo-Russian contacts in the 19th and 20th centuries. The Archive contains papers of members of the British community in Russia, as well as travellers and diplomats, governesses and soldiers, including the papers of writers such as Leonid Andreev (1871-1919); Nobel prizewinner Ivan Bunin (1870-1953), as well as the papers of the Russian railway engineer Yuri Lomonossoff (1876-1952).

https://library.leeds.ac.uk/special-collections/collection/728/leeds_russian_archive_collections

Liverpool Hope University – Nugent Archive

Monsignor James Nugent, better known as Father Nugent, was a Roman Catholic Priest of the Archdiocese of Liverpool. He was a passionate social reformer, appalled by the state of the homeless living in the squalor of Victorian England, he dedicated his life to the education and rescue of destitute children. Father Nugent was also an early pioneer of children’s emigration. In 1870 he took the first group of 24 children to Canada on 18 August 1870 on the SS Austrian; this was probably the first organised emigration of its kind.

Liverpool John Moores University – Stafford Beer Archive

Photograph of the Operations Room, created as part of Project Cybersyn in Chile 1971-1973. Courtesy of the Stafford Beer Collection, LJMU Special Collections & Archives.

Professor Stafford Beer (1926-2002) was an inspirational thinker, teacher and writer in the field of management cybernetics.  A polymath and credited as the founder of Management Cybernetics, he was appointed Honorary Professor of Organisational Transformation at LJMU in 1989.  He is probably best known internationally for his work on Project Cybersyn, a Chilean attempt to develop a cybernetic approach to the organisation and control of the economy in the 1971-1973 under the socialist government of President Allende.

https://www.ljmu.ac.uk/microsites/library/special-collections-and-archives/special-collections/stafford-beer-collection

University of Salford – Richard Badnall Papers

Richard Badnall (d 1842) and his collaborator Richard Gill patented the design of an “Undulating Railway”, an eccentric invention which caught the interest of many prominent people, including George and Robert Stephenson and the Directors of the Liverpool and Manchester Railway.  The collection, comprising mainly of correspondence, has been fully digitised.

https://usir.salford.ac.uk/view/archive_collections/badnall.html

Sheffield Hallam University – Festival of Britain Collection

The 1951 Festival of Britain was a showcase of British contributions to art, design and industry and a chance to celebrate and raise the nation’s spirits after the austerity of the war years. In the 1970s Sheffield Hallam University acquired a box of Festival items including press releases, letters and some official guides, but this has been enhanced through acquisition of a wider range of Festival literature and commemorative ephemera – such as postcards, teapots, toys, glassware and medals.

https://libguides.shu.ac.uk/specialcollection/festival

University of York – Denis Brutus Archive

Cover of Constitution of SA Sports Association. Copyright: Dennis Brutus Archive, Borthwick Institute for Archives, University of York.

Dennis Brutus (b. 1924) is best known for founding the South African Sports Association (SASA) whose essential aim was the elimination of racialism in South African sport. The South African Non-Racial Olympic Committee (SAN-ROC), with Brutus as its president, had considerable success: not only with the exclusion of South Africa from the Olympic Games in 1968, but also with the withdrawal of many African competitors from the 1976 Olympics. Forced into exile in 1966, Brutus left South Africa for England, where he worked for the International Defence Aid Fund. In 1971 he moved to the United States and died on 26 December 2009.

Related

Browse more collection descriptions for these institutions on the Archives Hub:

University of Bradford Special Collections

Durham University Archives

Lancaster University Archives

University of Leeds Special Collections

Liverpool Hope University Archives and Special Collections

Liverpool John Moores University Special Collections and Archives

University of Salford Archives & Special Collections

University of York, Borthwick Institute for Archives

All images copyright. Reproduced with the kind permission of the copyright holders.

Enhancing Access to the Leeds Archive of Vernacular Culture (LAVC) at the University of Leeds

Archives Hub feature for May 2021

“Wallops – nine pins” by Werner Kissling. LAVC/PHO/P1748

The LAVC is a unique, nationally important collection that holds all the materials from the internationally renowned Survey of English Dialects (SED) as well as the archives of the University’s former Leeds Institute of Dialect and Folk Life Studies (IDFLS).

It is currently the subject of a 3-year project “Dialect and Heritage” (2022). Funded by the National Lottery Heritage Fund, the project aims to open up the LAVC to public audiences, mapping its rich archives with 5 partner museums’ complementary collections and putting the LAVC back into the communities from which it was originally collected.

Since January 2020 Special Collections staff have been involved in the first phase, focusing on digitisation and enhancing the catalogue to support both long term access via its own catalogue as well as a dedicated project website due to launch in July 2021.

Already extensively catalogued as part of AHRB project in 2002, it has remained inaccessible to most non-academic audiences. Its rich narrative descriptions, pre-dated digital developments and metadata standardisation that can now optimise discovery. Current catalogue enhancements have therefore focused on adding new access points to facilitate improved search/browse.

About the LAVC: Dialect and Folklife

The SED was the first comprehensive, nationwide dialect survey in England, devised and coordinated by Professor Harold Orton at the University of Leeds during the 1950s-1960s. Originating at the end of World War 2, the Survey aimed to record and preserve the nation’s dialects before they were changed forever by modern development and migration. Fieldworkers would travel the length of the country to survey and interview informants in 313 rural locations with over 1000 questions on rural and home life. These were supplemented by a series of over 300 audio recordings for many of these locations, which were captured during or after the survey. To capture the natural richness of these local dialects, fieldworkers would engage informants by getting them to speak about absolutely any aspect of their lives.

“New Alresford Response Book”. LAVC/SED/2/2/3: 9/4/10

The former Institute of Dialect and Folk Life Studies (IDFLS) was part of the University of Leeds from October 1964 to September 1983. Under the initial directorship of Stewart F. Sanderson, the IDFLS expanded its focus from dialect and fostered teaching and research in the field of folk life studies. This included the Folk Life Survey and establishment of its own reference library which included undergraduate and postgraduate student research papers on dialect and folk life/folklore and research materials including manuscripts, printed and audio-visual items.

“Back Can” by Werner Kissling. LAVC/PHO/P1557

Highlights in the collection include:

  • The Survey of English Dialects questionnaire and response books: Detailed responses to over 1000 questions in 313 rural locations, written mainly in linguistic shorthand (IPA). They also contain more accessible glimpses into life in these communities with notes written in plain English and illustrations capturing ‘incidental material’ about the locations, informants and their traditions. All 313 books are being digitised with many online already.
  • Audio recordings: There are over 300 SED recordings and nearly 900 unpublished recordings relating to studies and research within IDFLS.  They are being digitised as part of the British Library’s “Unlocking Our Sound Heritage” (UOSH) project.
  • Interpretative Word Maps (mainly relating to SED results in the Linguistic Atlas of England ).
  • Over 2000 photographs which have now been digitised as part of the project. Over half were taken by Werner Kissling, employed between 1962 and 1966 as a photographic fieldworker in Yorkshire as part of the Institute’s Folk Life Survey. They also include photographs relating to SED locations and informants and student theses.

The Collection gives exceptional insights into language, culture and everyday life from the late 19th-20th centuries. It is particularly rich in capturing a variety of subjects including traditional methods of food production, rural work, crafts, hobbies, buildings, calendar and local customs, folklore and music.

Access Points

Prototype map-based search for LAVC Collection.

To improve access and discoverability the catalogue has been enhanced to include place, person and subject as structured data and access points.

This has included mapping 4000+ bespoke subject terms into 12 high level themes and 100 sub-categories based on Library of Congress Subject Headings (LCSH).  This will enable researchers to browse the collections by theme and discover related materials more easily.

We have also extracted location information relating to all photographs, audio recordings and SED response books to create several thousand geo-referenced location records. This means that these items can be plotted onto a map and are now available to discover on a new map-based search.

We have also created authority records for over 1000 informants of the SED, Folk Life Survey and Student Research Papers so that now it is possible to search by creators or informants.  

Finally, we have created a dedicated search page (currently in beta version) to bring together all these new ways of exploring the Collection with an A-Z for subject and people that can be browsed as well as the interactive map. Much of the cataloguing work is now complete and will be visible on the online catalogue over the next month. Work by the British Library to digitise and catalogue the SED audio recordings was delayed due to the COVID pandemic but is resuming. Work will continue to publish the wealth of digitised material over the coming year so that researchers can explore the collection remotely. The LAVC collection is available for research (https://explore.library.leeds.ac.uk/special-collections-explore/7436). 

Caroline Bolton, Archivist
University of Leeds Special Collections

Related

Browse all University of Leeds Special Collections descriptions on the Archives Hub

Previous features on LAVC

Cor, blust, squit! Stanley Ellis

Previous features on University of Leeds Special Collections

Interconnected archives: cataloguing the Rossetti family letters at Leeds University Special Collections

“Gather them in” – the musical treasures of W.T. Freemantle

Sentimental Journey: a focus on travel in the archives

Recipes through the ages 

World War One

All images copyright University of Leeds Special Collections. Reproduced with the kind permission of the copyright holders.

Robert Owen collection at the Co-operative Heritage Trust Archive

Archives Hub feature for April 2021

This May is the 250th anniversary of the birth of Robert Owen (1771-1858).  Known to many as the Father of Co-operation, Owen left an extensive legacy which is shown in the collection held by the Co-operative Heritage Trust.

Robert Owen

Born in Newtown, Wales Owen moved to London in 1784 aged just 13, then to Manchester a year later. In 1785 Manchester was the epicentre of the Industrial Revolution, and also a hotbed of intellectual and philanthropic discourse.   Owen was often present at the meetings of the Manchester Literary and Philosophical Society where he was able to expand his knowledge on a number of subjects.   When he first arrived in Manchester, Owen was employed at Satterfield’s Drapery on St Ann’s Square, where a blue plaque marks the site of the building.  He then became manager of the Piccadilly Mill and went on to establish the Chorlton Twist Mill.

Plaque dedicated to Owen on St Anne’s Square, Manchester.

Owen then went on to manage mills at New Lanark in Scotland which also marked his first venture into setting up a model community with an emphasis on education, particularly of young people as well as being involved in campaigns for a shorter working day.  He remained there for many years.  Today, New Lanark is a UNESCO World Heritage Site.

After leaving New Lanark, Owen traveled to New Harmony, Indiana intending to set up another model community there.  After the failure of this venture, Owen returned to England where he found his ideas were growing in popularity.  In 1835 he founded the Association of All Classes of All Nations and presided over a series of Congresses in Manchester & Birmingham.   Among Owen’s followers were some of the founding members of the Rochdale Pioneers Equitable Co-operative Society.

Notice for Owen’s proposed model community at New Harmony, Indiana.

Owen continued to promote his ideas by traveling around the country giving lectures but in the last years of his life settled in Sevenoaks, Kent.   It was around this time that Owen decided he wanted to write a three-volume autobiography (of which only one volume was completed before his death).  To do this, he wanted to gather together as much of his correspondence as he possibly could.  This was not an easy task as he corresponded with many individuals from all over the world.

Owen’s magazine, The Crisis.

Once the collection was gathered together Owen was assisted with the arranging of the material by his close friend James Rigby, who, at the same time, wrote the correspondents name and date of postage on the reverse of many of the letters, which was very helpful when the collection came to be catalogued!  In 1853 Owen wrote of his intentions to appoint William Pare, Robert Dale Owen, and Dr. Henry Travis as Trustees for his letters, as he wanted to ensure their safe-keeping following his death.

After Owen died in 1858, his letters were unaccounted for for many years due to being passed around the various executors.  It was not until the early 1900s that George Jacob Holyoake a journalist, secularist, co-operator and follower of Owen, made efforts to trace their whereabouts.   Holyoake eventually located the letters at a barristers’ chambers in London where they were stored in a metal trunk.  This became known as the ‘hair trunk’ as in addition to the letters, the trunk contained a lock of Owen’s hair.

Letter from Owen to his wife, Caroline.

In 1903, Holyoake gave the collection, which comprised over 3000 letters, to the Co-operative Union.  This was the first collection of what is now the Co-operative Heritage Trust Archive.   In 2010 the Collection was awarded a National Archives Cataloguing Grant and in 2016 the Collection was added to the UNESCO UK Memory of the World Register as a collection of significance.

The Archive also holds the correspondence of James Rigby which contains many letters from Owen as well as the correspondence of George Jacob Holyoake.

The Co-operative Heritage Trust Archive is located in central Manchester.   Due to Covid-19 restrictions, the reading room is currently closed to the public.  Information about re-opening will be on our website.

The Co-operative Heritage Trust looks after the Archive and the Rochdale Pioneers Museum.

Email: archive@heritagetrust.coop

Twitter: @CoopHeritage

Sophie McCulloch
Archivist
Co-operative Heritage Trust

Related

Robert Owen Collection, 1805-1858

James Rigby Correspondence Collection, 1848-1858

Papers of George Jacob Holyoake (1817-1906)

Browse all Co-operative Heritage Trust collections on the Archives Hub.

All images copyright Co-operative Heritage Trust. Reproduced with the kind permission of the copyright holders.

A Selection of Archives to mark International Women’s Day

To mark International Women’s Day on 8th March, here is a selection of archives featuring women who have excelled and been highly influential in many different fields.

Daphne Oram (1925-2003), composer and musician

The Daphne Oram Archive, held at Goldsmiths, University of London, comprises papers, personal research, correspondence and photographs documenting the life and work of a pioneering British composer and electronic musician.

Throughout her career she lectured on electronic music and studio techniques. In 1971 she wrote An Individual Note of Music, Sound and Electronics which investigated philosophical aspects of electronic music. Besides being a musical innovator her other significant achievements include being the first woman to direct an electronic music studio, the first woman to set up a personal studio and the first woman to design and construct an electronic musical instrument.

Delia Derbyshire (1937-2001), musician and composer

The University of Manchester holds the Papers of Delia Derbyshire, composer. After being rejected by Decca Records, who said that they did not employ women in the recording studio, in 1962 Derbyshire became a trainee studio manager at the BBC. She was soon seconded to work at the BBC’s Radiophonic Workshop, which had been set up to provide theme and incidental music and sound for BBC radio and television programmes. The following year, she produced her electronic ‘realisation’ of Ron Grainer’s theme tune for the hugely popular BBC series Doctor Who – which is still one of the most famous and instantly recognisable television themes. In the late 1990s there was renewed interest in her work and many younger musicians making electronic dance and ambient music (such as Aphex Twin and The Chemical Brothers) cited Derbyshire as an important influence.

The Anita White Foundation International Women and Sport Archive

Dr Anita White and Professor Celia Brackenridge were both associated with the University of Chichester, and they were both centrally involved in the leadership and development of the international women and sport movement since 1990. The International Women and Sport Archive is comprised primarily of papers brought together by them and other leaders in the movement, accumulated in the course of their research, study and work in the fields of the sociology of sport and sport science, and their involvement as activists and leaders in the global women and sport movement.

The International Women and Sport Movement is said to have been born out of a decade in which increasing globalisation brought together women from across the world in the practice of sport. It does not refer to any one organisation, body or country, but it is generally agreed that a landmark event and major catalyst in the movement was the first international conference on women and sport which took place on 5-8 May 1994.

Kaye Webb ( 1914-1996), editor and publisher

The Papers of Kaye Webb, covering her career as journalist, magazine editor, editor at Puffin and later literary agent, are held at the Seven Stories Archive. The collection provides a comprehensive record of Webb’s career, reflecting the wide variety of work undertaken by her, and documented through notes, correspondence, press cuttings, audio-visual material, memorabilia and ephemera. Webb was editor of Puffin Books between 1961 and 1979, and in 1967 founded the Puffin Club, which she ran until 1981. As a journalist she worked on publications including Picture Post, Lilliput and the News Chronicle.

Elizabeth Garrett Anderson (1836-1917), physician and suffragist

The Letters of Elizabeth Garrett Anderson are part of the Women’s Library Archives. An English physician and suffragist, she was was the first woman to qualify in Britain as a physician and surgeon. She was the co-founder of the first hospital staffed by women, the first dean of a British medical school, the first woman in Britain to be elected to a school board and, as mayor of Aldeburgh, the first female mayor in Britain. The letters cover Anderson’s struggle to secure an entry into the medical profession.

Barbara Castle (1910-2002), politician and campaigner

The Barbara Castle Cabinet Diaries at the University of Bradford cover 1965-1971 and 1974-1976. In the 1945 General Election Barbara Castle was elected M.P. for Blackburn, a seat that she retained for 34 years. Following the Labour victory in 1964, Prime Minister Harold Wilson put Castle in charge of the newly-created Ministry of Overseas Development. “I decided on 26 January that I ought to start keeping a regular record of what was happening”, she said. Castle maintained this political diary throughout her periods in office. In 1974 Castle was made Secretary of State for Social Services, and in this post she introduced payment of child benefit to mothers and worked on the State Earnings Related Pensions Scheme. In 1979 she became a Member of the European Parliament and in 1990 she entered the House of Lords as Baroness Castle of Blackburn.

Alison Settle (1891-1980), fashion journalist and editor

In a career spanning from the early 1920s to the early 1970s, Alison Settle worked as a fashion journalist, and Brighton Design Archive hold the Alison Settle Archive which includes professional papers dating from the mid-1930s. She was a tireless champion of the interests of women, as well as campaigning for good quality, affordable design through her relationships with designers and manufacturers. Settle sought to improve design standards in all areas of manufacture and production, and contributed to the work of both the Council for Art & Industry and the Council of Industrial Design. She remained one of the best known fashion journalists in the country.

Elise Edith Bowerman (1889-1973), lawyer and suffragette

Diaries, photographs and correspondence of Elsie Edith Bowerman are held at the Women’s Library. Bowerman followed her mother into the suffrage movement. They were both active members of the militant Women’s Social & Political Union. They were on the maiden voyage of the Titanic – both survived. She worked for Scottish Women’s Hospitals during the First World War, and she also worked for Emmeline and Christabel Pankhurst during their campaign for ‘industrial peace’ in support of the war effort. In 1924 or 1925 she went on to set up the Women’s Guild of Empire with Flora Drummond, with the aim of promoting co-operation between employers and workers. She was admitted to the Bar in the early twenties and practised until 1938, when she joined the Women’s Voluntary Services. In 1947 Bowerman went to the United States to help set up the United Nations Commission on the Status of Women.

Tessa Boffin (1960-1993), writer, photographer and performance artist

The Tessa Boffin Archive at the University for the Creative Arts includes lesbian, gay, bisexual, transexual and other photography projects, including portrayal of AIDS, cross dressing and safe sex, as well as notes on television and radio productions of the 1980s portrayal on feminism and AIDS. Boffin was one of the leading lesbian artists in Great Britain during the AIDS Crisis, but her risqué performances were controversial, and frequently drew criticism, including from inside the LGBTQ community.

Gladys Aylward (1902-1970), missionary

Gladys May Aylward was an evangelical Christian missionary to China. She travelled to China in 1932 and in 1936 she became a Chinese citizen. In 1940, against the background of civil war between Nationalist government troops and the Communists, Japanese invasion, and the threat of bandits, she led a group of orphans on a perilous journey to Sian. Her story was told in the book The Small Woman, by Alan Burgess published in 1957, and made into the film The Inn of the Sixth Happiness starring Ingrid Bergman, in 1958. The Papers of Gladys Aylward, held at SOAS, provide a vivid portrait of Aylward, including her life in China, and the impact of World War Two.

Researching LGBTQ+ History at North East Wales Archives

Have you ever wondered what LGBTQ+ archives might be held at North East Wales Archives (NEWA)? 

North East Wales Archives images.
North East Wales Archives images.

Today, we would like to shine the spotlight on some of the initiatives which are helping Wales to uncover the LGBTQ+ heritage held within our archives.   It can be quite a challenge to find records of this type of history since, because of its historically subversive nature, it was often hidden, destroyed or even put into code to avoid discovery.  Searching for records of LGBTQ+ history can prove difficult, because the terms that were used historically are different to those used in today’s language. Glamorgan Archives have put together an extremely helpful guide (PDF) called ‘Queering Glamorgan’, which also has an essential glossary of words and terms to help researchers find articles and stories in historic newspapers.

Image of archival storage, provided by North East Wales Archives/Adobe Spark.
Colourised image of archival storage units #LGBTQ (NEWA/Adobe Spark).

Societies like #Draig Enfys or #Rainbow Dragon are working tirelessly to find and share the stories and lives of people in Wales throughout the ages and to help us to explore the archives for ourselves. Draig Enfys is a research group set up by Norena Shopland, who specialises in researching, recording and promoting LGBT+, women’s and Welsh histories; Mark Etheridge, National Museum Wales; and Susan Edwards, Glamorgan Archives. They wanted to create a forum for researchers to network, help each other out and prevent people working on duplicate subjects.  They saw the benefit of people joining forces and collaborating together in this often lonely field of research.

There is also a hive of creative activity in this field, with original research being undertaken in Wales.  Projects like Living Histories Cymru, run by Jane Hoy and Helen Sandler, bring historic Welsh LGBTQ+ individuals to life through lively, costumed talks and plays. Other researchers and groups of young people are currently working with National Museum Wales to host various exhibitions and publish books on LGBTQ+ history.

James Henry Lynch: The Rt. Hon. Lady Eleanor Butler & Miss Ponsonby 'The Ladies of Llangollen'. A portrait from the Welsh Portrait Collection at the National Library of Wales. Image in the public domain.
James Henry Lynch: The Rt. Hon. Lady Eleanor Butler & Miss Ponsonby ‘The Ladies of Llangollen’. A portrait from the Welsh Portrait Collection at the National Library of Wales. Image in the public domain via Wikimedia Commons.

At the Denbighshire branch of NEWA, we hold Minutes of the weekly medical officers meetings which contain details of patient cases, including discussions on the benefits and problems associated with ECT treatment, and brief details on the treatment of a homosexual patient in March 1968.  We also hold records relating to the celebrated ‘Ladies of Llangollen’, ‘romantic friends’ in the 18th century, who ran away together to escape the constraints of patriarchal society to live together in isolation. Newspapers and court records at both branches are also rich sources of LGBTQ+ stories and pathways to further research.

Photographs of North East Wales Archives.
Photographs of the Hawarden (Archifdy Sir y Fflint / Flintshire Record Office) and Ruthin (Archifau Sir Ddinbych / Denbighshire Archives) branches.

If you are interested in LGBTQ+ history, why not try using the terms in Glamorgan Archives’ glossary to search for stories in online newspapers? You can also visit our website to uncover more sources of historical stories from your local area!

Teresa Davies
Archive Assistant
North East Wales Archives/NEWA (Hawarden)

Related

Explore more LGBTQ archives on the Archives Hub

Browse all Archifau Sir Ddinbych / Denbighshire Archives collections on the Archives Hub.

Browse all Archifdy Sir y Fflint / Flintshire Record Office collections on the Archives Hub.

Previous feature

Unlocking the Asylum: Cataloguing the North Wales Hospital Archive

All images copyright. Reproduced with the kind permission of the copyright holders.

Names (9): Structuring Data

In the last Names post I wrote about the 4-step process that covers ‘matching and meaning’. Step 2 was ‘Structuring data’, which means implementing a process to structure the elements that form part of a name string.

Many names are not structured. But if we can process the data to create better structure, we have a much better chance of matching it to other entries.

Here is a table showing some name entries around ‘J Watson’ (my examples are taken from real data, but sometimes tweaked a bit in order to cover different types of patterns – all the patterns will be found within the data).

Names based around ‘J Watson’ put into a structure table

The elements have been put into columns, and this is the idea with our structuring process. Some names are still strings – we cannot always know which part is a surname and which part a forename; and some names do not have that kind of structure anyway. We hope to identify floruit dates, and categorise them as distinct from life dates. We don’t want to match ‘1888-1938’ with ‘fl 1888-1938’ (although we might want to see this as a potential match). We will aim do something similar with birth and death dates. We want to gather all the information that is not a name or a date as ‘supporting information’.

Once we have the structure, it is far more likely we can match the name, and also control our level of confidence about matching. Here is a shorter table based on some of the entries from above:

namesurnameforenamedatesfl datesinfo
WatsonJames1834-1847stockbroker
Watson James1834-stockbroker
WatsonJb 1834Mr
James Watson1840-1847

You can see that two of the names are simply name strings. We may not be able to identify a surname and forename in ‘James Watson’ or ‘Watson James’. With the structure that we have imposed, it is possible to write a name matching process that provides a match between the first and second entries in the above table, because we can say with some confidence that a name that includes ‘James Watson’ and that has the birth date of ‘1834’ and the additional information ‘stockbroker’ refers to the same person. We might say this is a ‘definite’ match, or a ‘probable’ match. The third entry could be a ‘possible’ match, as it includes ‘Watson’ and ‘J’ with the same birth date of 1834. If the fourth entry had ‘stockbroker’, for example, then we might consider a possible match, but as things stand, it would not be a match.

It is very important that the interface we develop indicates to end users that we are matching name strings. There is a distinction between matching name strings and simply stating that X and Y are the same person. This will help us with introducing the idea of likely, probable and possible matches.

This structuring work is absolutely at the heart of creating a name interface, and enabling researchers to look up ‘James Watson’ and then potentially go in many different directions through the connections, finding ways that archives may be related. But it is really challenging. We will not ‘get it right’. Even if we had really substantial resources and time, we could not make it perfect. Archivists, as information professionals, are keen on ‘getting it right’, which is usually a good thing; but pulling together information using names created over decades, by thousands of cataloguers, in different systems, without a clear standard to work to….it ain’t ever going to be perfect. The key question is, whether this will substantially enhance the researcher experience and allow new connections to be made. And whether it will enable us to create connections outside of the archives domain. We have to have a change of mindset to accept that it is not perfect, but it is still hugely beneficial to research.

Just to emphasise the variation in data that we have, here are some EAD names, given as they are structured. They are all fine displayed within a description, suitable for a human reader, but they create challenges in terms of name matching. When you look at these, you have to think of the structure and semantics – essentially, how can we write an algorithm that allows us to truly identify the person (or that they are not a person!):

<persname>Barron, Lilias Mary Watson (b1912 : science graduate : University of Glasgow, Scotland)</persname>

<origination label=”Creator: “><persname role=”author”>Name of Author: various </persname></origination>

<persname source=”nra”>
<emph altrender=”surname”>Blore</emph>
<emph altrender=”forename”>Edward</emph>
<emph altrender=”dates”>1787-1879</emph>
<emph altrender=”epithet”>Architect Artist Antiquary</emph>
</persname>

<origination>Gerald and Joy Finzi</origination>

<origination>Walls Family – Tom Kirby Walls and Tom Kenneth Walls</origination>

<origination>National Union of Railwaymen; Associated Society of Locomotive Engineers and Firemen; National Busworkers’ Association.</origination>

<origination>
<persname authfilenumber=”https://viaf.org/viaf/61775126″ role=”creator” rules=”ncarules” source=”viaf”>
<emph altrender=”surname”>actor</emph>
<emph altrender=”forename”>dramatist and criticJohn Whiting English</emph>
</persname>

The last one was actually taken directly from VIAF and imported into the Archives Hub, which is, in principle, a really good way to create a structured name. Unfortunately, the process of pulling it into the Hub using the VIAF APE did not go quite according to plan. VIAF has just the same challenges as we do – there will be structural mistakes. However, it has the VIAF ID, so funnily enough, it is easier to match than many other names.

Many of the above examples are names added as archival creator names (‘origination’). Unfortunately, there has been a tendency for cataloguers to add creator names in a very unstructured way. The old Archives Hub Editor used to encourage this, and most archival systems have a free text field for name of creator. (Now, our Editor structures the creator name and adds it as an index term – so they are both identical).

We are currently looking at the challenge of matching origination name with the index term within the same description. That may sound like an easy task, but very often they are really quite different. For example, for the name of creator you may get:

<origination>Name of Authors: various but include Reverend<persname role=”author”>Thomas Frognall Dibdin</persname>,<persname role=”author”>Richard Bentley</persname>,<persname role=”author”>Philip Bliss </persname>and<persname role=”author”>Frederick James Furnivall</persname></origination>

This is nicely structured, so that it is easy to see that they are separate names, although the lack of life dates makes unique identification more difficult. If these individual names are also added as index terms, then we want to create just one entry for e.g. ‘Thomas Frognall Dibdin’ – we don’t want two entries for the one name (taken from the ‘origination’ and the ‘controlaccess’ index area) that both represent the same archive collection.

A common pattern is something like:

<origination label=”name of creator:”>Frances Dennis</origination>

With an index term of:

<persname rules=”ncarules”><emph altrender=”surname”>Dennis</emph><emph altrender=”forename”>Frances Mary</emph><emph altrender=”dates”>b 1874</emph><emph altrender=”epithet”>missionary</emph></persname>

‘Frances Dennis’ as a name string is very likely to be a match with ‘Frances Mary Dennis b1847 missionary’ when it is within the same collection. If these two entries were in different descriptions, we would not match them.

Our pre-match structuring will go a long way to increasing the number of matches, and hence the intellectual bringing together of knowledge through names. Matching creator name and index term name will reduce the amount of duplication. The framework will be tweakable, so that we can constantly review and improve.