Names (8): A 4 year old in red wellington boots

Firstly, an apology to those who commented. I was on a temporary machine for a while and didn’t get the notifications to approve the comments. I really appreciate feedback! And we need to think about this whole topic as an archive community.

Secondly, I wanted to pick up on some comments:

“If cataloguing archivists have access to a central pot of name authorities we are more likely to spot and re-use existing authority entries. So if one archivist identified Elizabeth Roberts 1790-1865 (artist) with a little potted biography which placed her in Penge, then a later archivist finding material from Lizzie Roberts in Penge in 1850s is much more likely to put 2 and 2 together manually”

In fact, one of the potential developments from the work we are doing is an interface specifically for cataloguers. The whole issue of ‘match’, ‘probable’ and ‘possible’ is tricky to present to end users, but relatively easy to present to cataloguers to help with creating names that will successfully be connected. So, we are bearing that in mind as a future development.

“When I looked at the list of names used in this article I thought ‘someone just doesn’t know what to include to properly describe a name”

Yes…I think that sometimes, when I am thinking about how to reconcile the massive variations and how to work with the lack of structure. But then I remember what it was like (when I was a proper archivist) to catalogue within time constraints. And I also remember that I am someone who spends half my life thinking about data! In addition, the point is that with archives it is perfectly valid to enter a name such as ‘Julia (fl 1976)’ because that is what you get from the item you are cataloguing, and nothing more. Maybe you could undertake research to find out who that it, but that would extend the time it takes to catalogue by days, if not weeks and months. For a researcher, this might jog something in the mind and lead to a connection being made. Something is better than nothing. For me, the entries that are rather more frustrating are names such as ‘various’ or ‘Author: various’, or ‘James MacAllister and various’ because these just aren’t names. However, many of these entries were probably created in a time when semantically structured data was not so important.

“The other way of dealing with this is to leave the final decision up to the end-user.”

Yes, this is a fair point. In our current thinking, the idea is that we have levels of confidence that we present to the user, and that allows them to make the decision. But we still need to think carefully about how to do this in a way that most clearly conveys meaning. The most difficult thing is to convey that even though you have linked several collection descriptions to one name, other name strings may also be a match. But at the end of the day, there is always the issue that decisions you make around the navigation and options provided to end users means they are likely to exclude some relevant results. A subject search will exclude any archives not indexed with that subject. Do you therefore dispense with a subject search? (More in this in future posts, as machine learning may present us with new tools to create subject entries).

Since my last post we actually hit the point of ‘blimey, this is just too difficult’. We really weren’t sure we were going to make this work, given the tremendous variations and, in particular, the lack of structure.

However, we have hacked our way through the undergrowth to create a path that I think will fulfil many of our aims. There is so much I could say, if I got into the detail of this, but I will spare you too much discussion around EAD and JSON structure!

A good part of the last few weeks from my point of view has been clarifying the thinking around what is required when processing names. I came up with the idea of the ‘4 pillars of names’.

  1. Matching

This refers to comparing and grouping names.  

Matching does not require us to know if it is a person or an organisation or to know anything about meaning at all. It is simply a process to group names.  So, ‘D J MacDonald’ could be a company or a person.  The question is, does that match ‘David John MacDonald’ or ‘D J MacDonald, manufacturers, Carlisle’?

Matching is therefore also about levels of confidence. It is about saying ‘D J MacDonald b.1932’ is the same as ‘D MacDonald b.1932’….or not. 

Matching may also mean matching a creator name and an index term within a record. For more on this, see below.  

2. Meaning

Name meaning is about whether it is a personal, corporate or family name.  Many creator names are just ‘creator’. There is no tagging to distinguish the type. Index terms have to have a type, but matching them up to creator name is not always easy. See more on that below.

3. Search behaviour

What happens when the user clicks on the name? Previous posts have presented our ideas for this. Whilst we are not yet ready to develop an end user interface, the options that are available to us for display are necessarily constrained by how we process the data. So we do need to think about this now.

4. Display

How we display a name record, or a name page. Again, not something we are focussing on now, other than to think about the sorts of features that we want to include.

* * *

Our discussions have been characterised by ‘one step forwards two steps backwards’, which can feel a little dispiriting. But we believe we have now sorted out the approach we need to take. I have spent a lot of time working collaboratively with Rob Tice from Knowledge Integration, unpicking the (many and varied) challenges in the data and as a result we’ve agreed an approach that we believe will produce the data that we want.

So, this again consists of 4 parts – a 4-step process that covers matching and meaning.

  1. Matching within a collection description

We need to try to match the creator name to the index term, if we have both. This is the first step in the workflow. To do this, the processing needs to identify names within one collection (each name needs to be attached to a collection via a reference).

Taking the description of the Caledonian Railway Company as an example (https://archiveshub.jisc.ac.uk/data/gb248-ugd008/7andugd8/38). The name appears as:

Creator: Caledonian Railway (railway company: 1845-1923: Scotland)
Bioghist: Caledonian Railway
Index term: Caledonian Railway, 1845-1923

We want to create one entry for these names that we take forwards into the de-duplication process. In this case, the names are all marked up as corporate names. But in many cases the creator is not marked up in this way. We need a process to match these entities to say that they are the same. This is about applying matching at the level of one collection, rather than across collections. When you apply it to one collection, you can decide to make more assumptions. For example,

Creator: Dorothy Johnson
Index term: Johnson, Dorothy, 1909-1966, Researcher into theatre history

This creator is not marked up as a personal name. If we worked with these entries in our general de-duplication, so that they were not associated with one particular collection, we could not say they are the same person. Indeed, we could not identify ‘Dorothy Johnson’ as a person, only as a creator. The relationship of these two entries would get lost. But within one collection description, we can make the assumption that they represent the same thing.

If we make this the first step we can remove many of the creator-as-string names from the processing – they will already be matched to a structured index term.

2. Structuring data

This is a process of following rules to structure data. Many names are not structured. PIDs (persistent identifiers) can by-pass this need for consistency, but at present the archive community barely uses recognised identifiers. I have posted previously on name authorities and structure. So, anyway, to introduce a bit of EAD, you might have:

<persname>Florence Nightingale, 1820-1910</persname>

or

<persname><emph altrender=”surname”>Nightingale</emph><emph altrender=”forename”>Florence</emph><emph altrender=”dates”>1820-1910</emph><emph altrender=”epithet”>Reformer of Hospital Nursing</emph></persname>

If we can process the first entry to give the kind of structure you see in the second entry that enables us to carry out de-duplication, and we have a much better chance of matching it to other entries. This is decidedly non-trivial, and we won’t be able to do this for all names.

3. De-Duplication

This is the process outlined in the blog post on de-duplication at scale . Once the other processes are in place, we are in a position to run the de-duplication process, and start to try out different levels of confidence with matching.

A working example: George Bernard Shaw

collection match:

George Bernard Shaw (gb97-photographs)
matches:
Shaw, George Bernard, 1856-1950, author and playwright (gb97-photographs)

structure rules:

apply rule: if it includes YYYY-YYYY and the preceding words include a comma then the first entry is a surname and the second entry is a forename
apply rule: YYYY-YYYY is a date
apply rule: words after YYYY-YYYY are additional information

Creates:
Surname: Shaw
Forename: George Bernard
Dates: 1856-1950
Additional information: author and playwright

de-duplication:

The structured entry matches a name from another description:

Surname: Shaw
Forename: G.B.
Dates: 1856-1950
Additional information: playwright.

*****

So, we are now in the process of implementing this workflow. The current phase of this project will not allow us to complete this work, but it will lay the foundations. Of course, we’ll find other challenges and issues. We still don’t know how successful we will be. There will definitely be names we can’t match and we can’t identify as personal or corporate. But then it is down to how we present the information to the end user.

I called this post ‘A 4 year old in red wellington boots’ because in her comment on the previous blog post Teresa used that as a metaphor for how we can think about data. We need to explore, to play with data, to search and discover, to not mind getting dirty. It is easy to get stressed about not getting everything right; but we need to jump into the puddles and just see what happens!

(instagram: shelightsthesky_photography)

Creating a COVID-19 archive at the Royal College of Nursing

Archives Hub feature for November 2020

Now more than ever as we continue to battle the COVID-19 pandemic, the world is reliant on its digital infrastructure; the need to provide and access accurate and up-to-date information is of paramount importance. This raises some interesting questions, challenges and opportunities for archive services who can play their part in the collective response to the crisis by capturing and recording events, activities and decisions. Archives and recordkeeping professionals have always supported the notions of accountability and transparency through their work, something which is being demonstrated in real time during the development of the pandemic.

As the UK’s largest trade union and professional association for nurses, the Royal College of Nursing (RCN) has been supporting and representing nurses and healthcare workers throughout the pandemic. It is vital that records of how this has been done are available to the organisation in perpetuity as evidence of advice given and decisions taken. The RCN has a responsibility to its members to be able to demonstrate that the organisation has been working in their best interests and the interests of their patients. In turn, the RCN archive has a responsibility to ensure that records with evidential and research value are captured, preserved and accessible to right audiences at the right time.

One of our first attempts at archiving the RCN COVID-19 webpages using our digital archive.
One of our first attempts at archiving the RCN COVID-19 webpages using our digital archive.

As a result, like many of our archivist and recordkeeping colleagues across the world, we have created a COVID-19 archive. Since the beginning of the year the RCN archive team have been actively collecting records relating to COVID-19 from across the organisation to build up a picture of how the pandemic has unfolded through the eyes of RCN members and staff. Unsurprisingly, this covers a wide range of record types and digital formats: web crawls of special COVID-19 webpages containing up-to-date guidance and advice, targeted staff emails, member surveys on working conditions and PPE, General Secretary’s video messages, special committee situation reports, newly created online nursing resources, publications – the list could go on. Within this set of records is a complex combination of access requirements and restrictions which, through balancing business confidentiality with public interest, we will manage alongside the records themselves.

We are in the fortunate position of having a remotely accessible network and a digital archive, which has meant that we have been able to collect these records as they have been created and start uploading them to our digital archive straight away. While some of the records we’re collecting as part of the COVID-19 archive project would have been transferred to us anyway, there are several new record series on our 2020 collecting plan as a result of the pandemic. For example, our first venture in web archiving was a test crawl of the RCN COVID-19 webpages; these are now collected regularly and form an integral part of the COVID-19 archive. Having seen and been inspired by the experiences of other archives already running successful daily web crawls to capture public advice and the public response, we decided to capture our pages daily as well – this ensured that we were keeping up to speed with each piece of new advice and guidance shared on the webpages. As the rate of updates to the pages has slowed, we have since reduced the frequency to weekly, although we continue to monitor them, ready to capture more frequently if needed. This was the pilot web archiving project we didn’t know we were doing until it happened, and it has in turn has sparked interest in a larger web archiving project to capture the whole RCN website, which is well underway.

A video message from Donna Kinnar, General Secretary, on the staff intranet. An example of the range of formats collected for the COVID-19 archive.
A video message from Donna Kinnar, General Secretary, on the staff intranet. An example of the range of formats collected for the COVID-19 archive.

Alongside the collecting of material, we have been considering how the records of the COVID-19 archive will fit into our existing catalogue structure. While it would be easy to create a new Fonds for COVID-19, we realised that this view was being skewed by our thoughts about future access to the material, and the ease at which colleagues or researchers would be able to view all the material neatly packaged together. Instead we plan to preserve the context of the records by arranging them by creator, in our case this is mostly the department of origin, to fit within our existing catalogue structure. There will be occasions when it is important to view all COVID-19 records together to get a complete picture of the reaction and response to the pandemic, so using the ‘linked collection’ feature in our digital archive we plan to create a virtual COVID-19 collection containing records from across different record series to allow this level of access. Beyond this we are considering which records from our COVID-19 archive will be shared on our public digital archive website to ensure the transparency and accountability that creating the COVID-19 archive in the first place helps to achieve.

We have certainly learnt a lot this year and the team has upskilled, becoming more proficient and confident in processing a wide range of digital formats, from collection through to access. Our sector has also stepped up by providing online webinars and training events to share our experiences of this extraordinary time. In May we participated in a panel discussion facilitated by Preservica, our digital archive supplier, who generously donated 250GB of storage space for us to store the COVID-19 archive. At the event we shared our plans and projects for collecting COVID-19 records with the archive community alongside colleagues from a wide range of institutions. These included Network Rail, who have been collecting records such as emergency train timetables introduced in response to the falling customer demand, and all the documentation that went into making this happen, and University at Buffalo in the US, who are encouraging students and staff to share their experiences of the pandemic by submitting video diaries and photographs to the archive. Learning about and reflecting on the wide range of collecting projects happening around the world is as informative as it is inspiring.

An example of a publication for the COVID-19 archive. This is the cover of the April 2020 Bulletin RCN members magazine.
An example of a publication for the COVID-19 archive. This is the cover of the April 2020 Bulletin RCN members magazine.

It is amazing to think that in the (probably not too distant) future the COVID-19 records we have collected will be catalogued, available to view online through our digital archive and be being used to inform research into, and evaluations of, the response of the UK’s largest independent nursing organisation and our role in how Britain handled the pandemic.

Katherine Chorley, Digital Asst Archivist
Royal College of Nursing Archives

Related

Browse all Royal College of Nursing Archives collections on the Archives Hub.

Previous RCN Archives feature: Cathlin du Sautoy and Hermione Blackwood: personal papers at the Royal College of Nursing Archives

All images copyright Royal College of Nursing Archives. Reproduced with the kind permission of the copyright holders.

Exploring New Worlds in the Archives Hub

This blog post forms part of History Day 2020, a day of online interactive events for students, researchers and history enthusiasts to explore library, museum, archive and history collections across the UK and beyond.

Use the Archives Hub, a free resource, to find unique sources for your research, both physical and digital. Search across descriptions of archives, held at over 350 institutions across the UK.

History Day 2020 coincides with the Being Human festival, the UK’s national festival of the humanities. Their theme this year is ‘New Worlds’, so taking this as our inspiration, we’re highlighting a range of archive collections – across Travel, Exploration, Space Exploration and Science Fiction.

Travel

Austen Henry Layard’s passport (1) (LAY/1/4/8)
Austen Henry Layard’s passport (1) (LAY/1/4/8). Image copyright: University of Newcastle.

Unearthing Family Treasures: The Layard and Blenkinsopp Coulson Archives
In 1839 a young lawyer left behind his London office for a post in the Ceylon (now Sri Lanka) Civil Service, thus beginning a series of travels, adventures and discoveries which would result in him achieving world renown for uncovering and shining a light on the ancient civilizations of Mesopotamia, in particularly Assyrian culture. That young man was Austen Henry Layard. Read the feature, by University of Newcastle Special Collections.

Papers of Elizabeth Thomson, 1847-1918, teacher, missionary, traveller and suffragette, c1914
Throughout the 1890s and 1900s Thomson travelled the world with her sister, Agnes, working as teachers and missionaries. The countries they visited include India, Japan, the USA, Germany and Italy. In the summer of 1899 Thomson reports that she visited Faizabad in India to learn Urdu but could not stand the heat and left for Almora in 1902. In 1907 she sailed to Bombay to complete missionary work, before teaching English in Sangor for the winter. In 1909 she travelled back to the UK, via Vienna, Prague, Dresden and Berlin, to settle in Edinburgh. Material held by University of Glasgow Archive Services – see the full collection description.

Steel engraving, 1875. © Image is in the public domain.
Steel engraving, 1875. © Image is in the public domain.

Sentimental Journey: a focus on travel in the archives
The hundreds of collections relating to travel featured in the Archives Hub shed light on multiple aspects of travel, from royalty to the working classes, and encompassing touring, business, exploration and research, the work of missionaries and nomadic cultures. Read the feature.

An abstract of a voyage from England to the Mediteranian: the diary of an anonymous English naval victualler, 1694-1696
Contains the log of an anonymous English naval victualler on a voyage from Gravesend in England to Cadiz in the Mediterranean between 31 December 1694 and 29 October 1696. Material is in English Spanish Latin Hebrew. Written in a single neat late seventeenth-century English hand with the text on each page set within faint ruled lines. There are many tables, diagrams, and quite finely-drawn illustrations of places en route, especially in Spain, and interesting objects, such as keys and seals. Material held by University of Leeds Special Collections – see the full collection description.

Bodiwan Papers, 1634-1923
The papers of Michael D. Jones and his family, which include numerous letters to Michael D. Jones from the Welsh settlers in Patagonia or relating to them, prior to the sailing of the Mimosa and after. Amongst them is a letter from Charles de Gaulle, the eminent Breton and Celticist, expressing his interest in the scheme to found a Welsh colony in Patagonia. Also, amongst the correspondents are L. Patagonia Humphreys, Rev. D. Lloyd Jones, Rhuthun and Mihangel ap Iwan and Llwyd ap Iwan. The papers reflect the hardship suffered by the new settlers as well as the investment made by Michael D. Jones in the venture. There are bills and receipts relating to the Mimosa, share certificates, statistics regarding population for 1879. Also, a bank pass book of the Welsh Colonising and General Trading Company Ltd, 1870-1883, and a register of the Welsh applicants to Patagonia, 1875-1876. The collection is held by Archifdy Prifysgol Bangor / Bangor University Archives – see the full collection description.

The London to Istanbul European Highway
Part of The National Motor Museum Trust Motoring Archive‘s Bradley Collection, including striking illustrations by Margaret Bradley. Read the feature.

The handsome blue car, by Margaret Bradley. ‘With apologies…this being a rough sketch…made somewhere in the middle of no mild channel’. Sketch by Margaret Bradley, copyright the National Motor Museum Trust.
The handsome blue car, by Margaret Bradley. ‘With apologies…this being a rough sketch…made somewhere in the middle of no mild channel’. Sketch by Margaret Bradley, copyright the National Motor Museum Trust.

Exploration

Cambridge Svalbard Exploration Collection, 1933-1992
The collection documents many decades of scientific work undertaken by (mostly) Cambridge researchers from 1938 until the early 1990s. These were mostly led by Walter Brian Harland (1917-2003), who also became the collator of the materials collected in Spitsbergen. The documentary archive complements the physical collection of geological specimens collected during those expeditions. Svalbard is located in the north-western corner of the Barents Shelf 650km north of Norway, and is named after the Dutch Captain, Barents, who is credited with the modern discovery of the islands in 1596 and after whom the Barents Sea is named. Collection held by Sedgwick Museum of Earth Sciences, University of Cambridge – see the full collection description.

Online Resource: Old Maps Online – provided by Great Britain Historical GIS Project, Maps Online is a search portal that combines the historical map collections of several organisations around the world. Users can search across collections through a single interface and easily locate multiple maps of a geographical area. The interface is free and access is open to all users. A wide range of different types of map are available, including: land maps; sea charts; boundary and estate maps; military and political maps; and town plans. Historical maps of many countries are available – including South and Central America from the 16th to the 20th centuries; Britain and particularly London, up to 1860; North America in the 18th and 19th centuries; pre-1900 Dutch Maps; the North West of England; and Moscow. More details.

Challenger Expedition Photographs, 1870s-1885; 1981-1983
HMS Challenger set out to collect specimens from different depths of water across the globe. The voyage took place between 1872 and 1876. It is thought that this was the first expedition to routinely use photography to document the journey. There was a darkroom on board so photographs could be developed on the ship. Material held by National Museums Scotland – see the full collection description.

Shackleton’s Endurance Expedition Centenary
27th October 1915: Antarctic expedition ship Endurance was abandoned on the orders of Sir Ernest Shackleton and their expedition became fight for survival. Read the feature by the Scott Polar Research Institute, University of Cambridge.

Space Exploration

John Herschel’s photograph of his father’s 40-foot telescope.
Herschel’s 40-foot telescope, circular glass plate photograph. The telescope’s wooden scaffolding is seen here on 9 September 1839, at Observatory House in Slough, England. It was photographed by the astronomer John Herschel (1792-1871) before its demolition. The telescope was designed by John’s father, the German-born British astronomer William Herschel (1738-1822). The tube was 40 feet (12 metres) long. The first observations with this telescope were carried out 50 years earlier on 28 August 1789, when two new moons of Saturn (Enceladus and Mimas) were discovered. 50 years later, by 1839, John Herschel and W H Fox Talbot had invented the process we now know as photography. This is one of the earliest surviving glass plate photographs. Image copyright: Royal Astronomical Society Archives

Russian Space Exploration, 1903
Drawings, documents, photographs, ephemeral objects and memorabilia relating to early Russian space exploration. Objects include domestic items such as cigarette cases, ashtrays, cigarette ornamental dispensers, desk thermometers, ornamental lamps and tea glass holders. Included in the collection are photo albums and a press cutting album made by a school child as well as stamp collections. The collection boasts rare drawings by Konstantin Tsiolkovsky in which he envisaged the exit from a spacecraft into the vacuum of space as well as a drawing of a Reactive engine (Rocket engine); one of the first designs of its kind from c.1930. The collection is held by De Montfort University Archives and Special Collections – see the full collection description.

Jodrell Bank Observatory Archive, c.1924-1993
The Jodrell Bank Observatory is one of the world’s largest radio-telescope facilities. Originally known as the Jodrell Bank Experimental Station, it was renamed the Nuffield Radio Astronomy Laboratories in 1966, and changed to its current name in 1999. The first radar transmitter and receiver was installed by Bernard Lovell, then working as a physicist at the University of Manchester, at Jodrell Bank, Cheshire, in December 1945 (the University campus had proved unsuitable because of the high level of electrical interference). At this period Lovell was researching cosmic rays under the direction of Patrick Blackett, professor of physics at the University of Manchester. Lovell’s work involved studying radio echoes from large cosmic ray showers in the Earth’s atmosphere, using old military radars. As a result of this, Lovell went on to make important discoveries in meteoric astronomy. The collection is held by University of Manchester Library – see the full collection description.

The Herschel archive at the Royal Astronomical Society
The Royal Astronomical Society is the custodian of a significant collection of the astronomy-related papers of William, Caroline and John Herschel. Read the feature.

Caroline Herschel.
Caroline Lucretia Herschel (1750-1848), German- born British astronomer, in 1847, pointing at the orbit of a comet on a map of the solar system. The map shows all the planets out to Saturn. Uranus had been discovered in 1781 by William Herschel, but was at first thought to be a comet. Neptune was discovered in 1846. The map also shows the asteroids Ceres (discovered in 1801), Pallas (1802), Juno (1804) and Vesta (1807). Caroline was the sister of William Herschel, and worked with him in England. She discovered eight new comets between 1786 and 1797. After her brother’s death in 1822, Caroline returned to Hanover, where she died at the age of 98. This artwork shows Herschel in Hanover in 1847, the year before she died. Image copyright: Royal Astronomical Society Archives

Science Fiction

Papers of Douglas Noël Adams, 1952-2001 (Circa.)
Douglas Noël Adams was born in Cambridge in 1952. He was awarded an exhibition to read English at St John’s College, Cambridge, obtaining his BA in 1974. While at Cambridge, Adams occupied himself chiefly in writing, performing in, and producing comedy sketches and revues, establishing connections that were to be integral to his future work. His career took off with ‘The Hitchhiker’s Guide to the Galaxy’, a six-part comic science-fiction radio series commissioned by the BBC in 1977 and broadcast in 1978. Novelisation and a second series were followed by further books in what became billed as ‘the increasingly inaccurately named Hitchhiker’s Trilogy’. The ‘Hitchhiker’s Guide’ series has taken many forms, including audio recordings; stage adaptations; a television series; a computer game; publication of the original radio scripts; radio adaptations of the remaining novels, and a film. Adams’s other creative work included writing and script-editing for BBC Television’s ‘Doctor Who’. Material held by St John’s College Library Special Collections, University of Cambridge – see the full collection description.

Papers of Brian Aldiss, 1966-1995
Brian Aldiss was born in 1925 in Dereham, Norfolk. After war service in the Royal Corps of Signals he entered the bookselling trade, working at Sanders & Co. in Oxford. His first work as a writer was The Brightfount Diaries, a fictionalised diary of a bookseller first published as a column in The Bookseller during 1954 and 1955 and published as one volume by Faber & Faber in 1955. The following year he became a full-time writer, and in 1957 his first science fiction book, the short story collection Space, Time and Nathaniel was published. His first science fiction novel, Non-Stop was published in 1958. Since then Aldiss has been a prolific writer, best known for his science fiction novels, novellas and short stories, including the award-winning Helliconia trilogy. He has also been a historian and critic of the genre, and has edited many science fiction collections. In addition, his ‘mainstream’ writing has included the novels The Male Response, Forgotten Life and the semi-autobiographical Horatio Stubbs sequence. He was elected a Fellow of the Royal Society of Literature in 1989. In 1990 he published his autobiography, Bury my heart at W.H. Smith’s. the collection is held by the University of Reading Special Collections Services – see the full collection description.

Other ‘New Worlds’

Pan-African Congress 1945 and 1995 Archive
The Pan-African Congress was a series of meetings, held throughout the world. In 1945 Manchester hosted the 5th Pan-African Congress. The Pan-African Congress was successful in bringing attention to the decolonization in Africa and in the West Indies. The Congress gained the reputation as a peace maker and made significant advance for the Pan-African cause. One of the demands was to end colonial rule and end racial discrimination, against imperialism and it demanded human rights and equality of economic opportunity. The manifesto given by the Pan-African Congress included the political and economic demands of the Congress for a new world context of international cooperation. material is held by the Ahmed Iqbal Ullah Race Relations Resource Centre – see the full collection description.

Records of the British Union for the Abolition of Vivisection, 1865-1996
The British Union for the Abolition of Vivisection (BUAV) was founded in 1898 by Miss Frances Power Cobbe (1822-1904). Concern for the welfare of animals was not a new phenomena, the first wave of anti-vivisection feeling in England commenced around the middle of the nineteenth century. The Second World War appeared to foster greater ideas of cooperation within the animal welfare movement. The Conference of anti-vivisection Societies first met on 20 November 1942. Five societies were represented at the invitation of BUAV ‘for the purpose of discussing and making plans for a joint intensive campaign, after the war, to claim the total abolition of vivisection as a necessary step towards securing for animals their rightful place in the new world order, which it is generally believed will follow the peace’. The immediate post war period began to see a rise in public demonstrations as a medium to spread the anti-vivisection message, in particular these were held outside vivisection laboratories. The collection is held by Hull University Archives, Hull History Centre – see the full collection description.

The Percy Johnson-Marshall Collection, 1931-1993
Percy Edwin Alan Johnson-Marshall (1915-1993) was one of the most energetic of a generation of town-planners who began their careers in the 1930s and, after the Second World War, dedicated their lives to the creation of a new world of social equity through the radical transformation of the human environment. Material held by Edinburgh University Library Special Collections – see the full collection description.

Find out more

Names (7): Into the Unknown

On the Archives Hub we have plenty of name entries without dates. Here is an example of the name string ‘Elizabeth Roberts’ (picked entirely randomly) from several different contributors:

Richard and Elizabeth Roberts
Roberts, Elizabeth fl. 1931
Elizabeth Grace Roberts
Roberts, Elizabeth Grace
Elizabeth Roberts
Roberts, Elizabeth
ROBERTS, Elizabeth Grace
ROBERTS, Mrs Elizabeth Grace

The challenge we have is how to work this names like this. Let me modify this list into an imaginary but nonetheless realistic list of names that we might have on the Hub, just to provide a useful example (apologies to any Elizabeth Roberts’ out there):

Elizabeth Roberts 1790-1865
Elizabeth Roberts, 1901-1962
Elizabeth Roberts b 1932
Elizabeth Roberts fl. 1958
Elizabeth Roberts, artist
Elizabeth Roberts
Elizabeth Roberts
Elizabeth Roberts

How should we treat these names in the Archives Hub display? If we can make decisions about that, it may influence how we process the names.

These names can be separated into two types (1) name strings that identify a person (2) name strings that don’t identify a person. This is a fundamental difference. It effectively creates two different things. One is an identifier for a person; one is simply a string that we can say is a name, but nothing more.

If we put two descriptions together because they are both a match to Elizabeth Roberts, 1790-1865, then we are stating that we think this is the same person, so the researcher can easily see collections and other information about them. 

If we put two descriptions together that are both related to Elizabeth Roberts we are not doing the same thing.  We are simply matching two strings. 

Which of these names is an identifier? That depends upon levels of confidence, and that is why being able to set and modify levels of confidence is crucial.

Elizabeth Roberts 1790-1865 – this is enough to identify a person.  In theory, there could be two people with the same life dates, but the chances are very low. So, we would bring together two entries and represented them on one name page.

Elizabeth Roberts b 1932 – Is a birth or death date enough? It allows for some measure of certainty with identity, and we would probably deem this to be enough to identify a person and match to another Elizabeth Roberts born in 1932, but it is not certain. If this Elizabeth Roberts was the creator, and she has several mentions of ‘art’, ‘artist’ and ‘painting’ in her biography, it is more likely that she is the same as Elizabeth Roberts, artist and might be useful to create a link, but would it be enough for a match?

Elizabeth Roberts fl 1931 – whilst a floruit date helps place the person in a time period, it is not enough to confidently identify a person.  

Elizabeth Roberts, artist – occupation or other epithet enough is not usually enough to identify someone.   If there is a biographical history, there is more information about the person, but this is not enough to be sure. 

If we had an entry such as Elizabeth Roberts, Baroness Wood of Foxley (completely imaginary and just for the purposes of example), then the epithet is more helpful. We might decide that this identifies a person enough for a match with any other instances of Elizabeth Roberts with baroness wood and foxley in the name string.

If we had MacAlister, Sir Donald, 1st Baronet, physician and medical administrator then ‘1st baronet’ alongside the name should give enough confidence for a match with another entry for 1st Baronet.

Display behaviour

So, how might we reflect this in the display? It can be useful to think about the display and researcher requirements and expectations and work back from there to how we actually process the data.

Firstly we might group two entries if they have the same date.

But this does not offer much benefit to the end user. They still see eight entries for this name string. So, we might bring together the entries that match exactly on the name string.

But there are still two entries that are essentially just name strings – the fl. and the ‘artist’ entry are essentially the same as those without any additional information in that they are name strings and they do not identify a person, so it makes sense to group all of these entries.

screenshot of shortest list of names with matching

We now have a short set of entries. We can’t merge any more of them.

However, this does leave us with a problem. The end user is likely to assume that these all represent different people. That ‘Elizabeth Roberts’ is a different person from ‘Elizabeth Roberts 1901-1962’. The tricky thing is that she might be….and she might not be. It is likely that a user wanting Elizabeth Roberts with dates 1790-1865 would see the above list and click on the matching entry, not realising that the last three entries could also refer to the same person.  We don’t want to exclude these from the researcher’s thinking without hinting that they may represent the same person.

We might give the list a heading that hints at the reality, such as ‘We have found the following matches:’. Maybe ‘matches’ would have a tool tip to say that the entries without dates could match the entries with dates. It is quite hard to even find a way to say this succinctly and clearly.

The identifiable names would link to name pages. We might provide information on the name pages to again emphasise that other Elizabeth Roberts entries could be of interest. We haven’t yet decided what would be best in terms of behaviour for the non-identifiable names – they might simply link to a description search – it does not make much sense to have a full name page for an unidentified person where all you have is one link to one archive description. We can’t provide links to any other resources for a non-identifiable name; unless we simply provide e.g. a Wikipedia lookup on the name. But again, we face the issue of misleading the end user; implying a ‘same as’ link when we do not have enough grounds to do that.

Names as creators

We may decide to treat creator names differently. Archival creator does have a significant meaning – it emphasises that this is a collections about that person or organisation (though even the nature of the about-ness is difficult to convey). But many users do not necessarily appreciate what an archival creator is, and many descriptions don’t provide biographical histories, so could this end up creating confusion? Also, in the end a creator name is far more likely to include life dates, so then they would have a full name page anyway. What would be the benefit of treating a creator name with no life dates and no biographical history differently from an index term and giving it a name page? You would just be linking to one archive, albeit ‘their’ archive.

What about if a name string record, say the Elizabeth Roberts fl 1931, has been ingested as an EAC record, i.e. a name record that was created by one of our contributors? It is likely that name records will include a full date of birth, or at least a birth or death date, but this is not certain. Whilst we are not currently set up to take in EAC-CPF name records, we do plan to do this in the future. If the name is provided through an EAC record and they are a creator, they may have a detailed biography, and may have other useful information, such as a chronology, so a name page would be worthwhile.  

This short analysis shows some of the problems with providing a name-based interface. We will undoubtedly encounter more thorny issues. The challenge, as is so often the case, is just as much about how to convey meaning to end users when they are not necessarily familiar with archival perspectives, as it is about how to process the data.

And we haven’t even got to thinking about Eliza Roberts or Lizzy Roberts…..

Birkbeck’s Archive

Archives Hub feature for October 2020

Birkbeck was founded as the London Mechanics’ Institute on the evening of the 11th November 1823, when approximately 2,000 people listened to Dr George Birkbeck speak on the importance of education for working Londoners at the Crown and Anchor Tavern on the Strand.  Supporters there that evening included Jeremy Bentham, the philosopher and originator of Utilitarianism, Sir John Hobhouse, a Radical MP who held several important government posts across his career, and Henry Brougham, a liberal MP, anti-slavery campaigner and educational reformer.

George Birkbeck, founder of Birkbeck painted by Samuel Lane circa 1825, Birkbeck Image Collection.

Birkbeck has been transforming lives by helping people access higher education for nearly 200 years. This year, 2020, we celebrate our 100th anniversary of our membership of the University of London. When Birkbeck joined the University of London, it was on the condition that it should continue to provide evening teaching, and this remains our central mission.

The Library at Breams Building, Chancery Lane, Birkbeck Image Collection.
The Library at Breams Building, Chancery Lane, Birkbeck Image Collection.

As we move toward our 200th anniversary in 2023, part of the Birkbeck archive was rediscovered in an offsite storage facility. This has proved to be a rich source, not only providing insights not into our institutional history but also stories of both staff and students allowing us glimpses into their lives. We now find ourselves in the position of having two sections of the archive, each telling our story from different perspectives.

One section of the archive is held in the main Birkbeck building and is comprised of records pertaining to the history of Birkbeck from an organisational context, including minutes of various committees, published student journals and newsletters, annual reports, calendars, early student registers and staff information. 

Birkbeck College, Courses of Study front cover. Birkbeck Image Collection.
Birkbeck College, Courses of Study front cover. Birkbeck Image Collection.

The second section is held offsite and is made up of a range of material including; war correspondence, departmental papers, estates documents, all of which demonstrate Birkbeck’s unique aim and how that aim has held strong through changing political, economic and cultural times.

To date one Birkbeck academic, Professor Joanna Bourke, has explored this material, along with two of her PhD students. They have found it to be an excellent source for their research. One of the themes that runs through the archive is around trends in education such as educational policies and practices. This includes charting the life cycle of different academic disciplines as well as documenting different approaches to teaching and the broader aspects student life.

Art class at the Birkbeck Literary and Scientific Institution, Breams Buildings, circa 1915, Birkbeck, University of London. Birkbeck Image Collection.

Like many university archives, we have records of notable Birkbeckians who worked or studied with Birkbeck. We can now develop more of a picture of the lives of people such as; JD Bernal (Crystallography), Eric Hobsbawm (History), Nikolaus Pevsner (History of Art), Helen Gwynne-Vaughan (Botany). We can also learn more about those who were less well-known who studied here and made an impact like the playwright Arthur Wing Pinero and socialist, women’s rights activist Annie Besant. The library is creating an online timeline to highlight the life and work of various Birkbeck academics as part of the celebrations in the lead up to our 200th anniversary.

Helen Gwynne Vaughan in her Botany Laboratory with students circa 1923, Birkbeck, University of London. Birkbeck Image Collection.

In terms offering different perspectives, this part of the archive also holds accounts of the wider Birkbeck community, beyond the academic staff and students, those members of staff working in catering and hospitality roles, administrative staff, laboratory technicians. This provides an opportunity to explore social history through those lived experiences documented through various formats, such as letters and photographs.

It’s an exciting time at Birkbeck as we continue to uphold the ethos and pursue the central mission of providing access to education for all. Birkbeck is still London’s only specialist provider of part-time evening higher education as well as being a world-class research institution. The archive will continue to tell the story of Birkbeck as an institution as well as all those who work, study and research here. You can follow Birkbeck’s journey to its 200th anniversary.

Main Birkbeck Building, Birkbeck Image Collection.

Emma Illingworth
Subject Librarian for Science (Biological, Earth & Planetary, Psychological)
Library Services, Birkbeck, University of London

Related

Browse all Birkbeck Library Archives and Special Collections, University of London descriptions available to date on the Archives Hub.

All images copyright Birkbeck Library Archives and Special Collections, University of London. Reproduced with the kind permission of the copyright holders.

Names (6): Deduplication at scale

Having written several blogs setting out ideas and thoughts about challenges with names, this post sets out some of our plans going forwards in order to create name records for a national aggregator; something that can work at scale and in a sustainable way. The technical work is largely being undertaken by Knowledge Integration, our system suppliers, though working closely with the Archives Hub team.

Consider one repository – one Hub contributor. They have multiple archives described on the Archives Hub, and maybe hundreds or thousands of agents (people and organisations) included in those descriptions. All of this information will be put into a ‘management index‘. This will be done for all contributors. So, the management index will include all the content, from all levels, including all the names. A huge bucket of data to start us off.

A names authority source such as VIAF or any other names data that we would like to work with will not be treated any differently to Archives Hub data at this stage. In essence matching names is matching names, whatever the data source. So, matching Archives Hub names internally is the same as matching Archives Hub names to VIAF, or to Library Hub, for example. However, this ‘names authority’ data will not go into our big bucket of Archives Hub data, because, unless we create a match with a name on the Hub, the authority data is not relevant to us. Putting the whole of VIAF into our bucket of data would create something truly huge. It is only if we think that this external data source has a name that matches a person or organisation on the Hub that it becomes important. So data from external sources are stored in separate reference indexes (buckets) for the purposes of matching.

Tokenisation

Knowledge Integration are employing a method known as tokenization, which allows us to group the data from the indexes into levels (It is quite technical and I’m not qualified to go into it in detail, so I only refer briefly to the basic principles here. Wikipedia has quite a good description of tokenization). With this process, we can establish levels that we believe will suit our purposes in terms of confidence. Level 1 might be for what we think is a guaranteed match, such as where an identifier matches. So, for example, Wikidata might have the VIAF identifier included, so that the VIAF and Wikidata name can be matched. In some cases, the Archives Hub data includes VIAF IDs, so then the Hub data can be matched to VIAF. We also hope to work with and create matches to Library Hub data, as they also have VIAF ID’s.

Image showing versions of a name all with the same ID.
If all versions of a name have the same ID then they can be matched.

Level 2 might be a more configurable threshold based around the name. We might say that a match on name and date of birth, for example, is very likely an indication of a ‘same as’ relationship. We might say that ‘James T Kirk’ is the same person as ‘James Kirk’ if we have the same date of birth. This is where trial and error is inevitable, in order to test out degrees of confidence. Level 3 might bring in supporting information, such as biographical history or information about occupation or associated places. It is not useful by itself, but in conjunction with the name, it can add a degree of certainty.

Screenshot of part of a biographical history
Biographical information may be used to help match names

We are also thinking about a Level 4 for approaches that are Archives Hub specific. For example, if the same name is provided by the same repository, could we say it is more likely to be the same person?

This tokenisation process is all about creating a configurable process for deduplication. Tokens are created only for the purposes of matching. Once we have our levels decided, we can create a deduplication index and run the matching algorithm to see what we get.

Approaches to indexing

For deduplication indexing, the first thing to do is to convert to lower case and remove all of the non-alpha characters. (NB: For non-latin scripts, there are challenges that we may not be able to tackle in this phase of the project).

The tokens within the record will be indexed in multiple ways within the deduplication index to facilitate matching. This includes indexing all words in order that they appear, and also individual word matches.

Then, particularly when considering using text such as biographies to help identify matches, we can use bigrams and trigrams. These essentially divide text into two and three words chunks. A search can then identify how many groups of two and three words have matched. Generally, this is a useful method of ascertaining whether documents are about the same thing. It may help us with identifying name matches based upon supporting information. This is very much an exploratory approach, and we don’t know if it will help substantially with this project, but certainly it will be worth trying out this approach, and also considering using it for future data analysis projects.

Character trigrams break down individual words into groups of three characters and may be useful for the actual names. This should be useful for a more fuzzy matching approach, and it help to deal with typos. It can also help with things like plurals, which is relevant for working with the supporting information.

We are also going to explore hypocorisms. This means trying out matches for names such as Jim, Jimmy and James or Ned, Ed, Ted and Edward. A hypocorism is often defined as a pet name or term of endearment, but for us it is more about forename variations. Obviously Jim Jones is not necessarily the same person as James Jones, but there is a possibility of it, so it is useful to make that kind of match on name synonyms. It is often defined as a pet name or term of endearment.

Hypocorisms refers to pet names or terms of endearment

From this indexing approach we can try things out and see what works. There is little doubt that it will require an iterative and flexible approach. We can’t afford to set up a whole process that proves ineffective so that we have to start again. We need an approach that is basically sound and allows for infinite adjustments. This is particularly vital because this is about creating a framework that will be successful on an on-going basis, for a national-scale service. That is an entirely different challenge to creating a successful outcome for a finite project where you are not expecting to implement the process on an on-going basis. Apart from anything else, a project with a defined timescale and outcome gives you more leeway to have a bit of human intervention and tweak things manually to get a good result.

Group records

Using the tokenisers and matching methods we can try processing the data for matches. When records are matched with a degree of certainty, a group record is created in the deduplication index. It is allocated a group id and contains the ids of all of the linked records. This is used as the basis for the ‘master record’ creation.

Primary or master records

I have previously blogged some thoughts about the ‘master record’ idea. Our current proposal is that every Archives Hub name is a primary record, unless it is matched. So, if we start out with six variations of Martha Beatrice Webb, 1858-1943, then at that point they are all primary records and they would all display. If we match four of them, to a confidence threshold that we are happy with, then we have three primary records. One of the primary records covers four archives. We may be able to still link the other two instances of this name to the aggregated record, but we can assign a lower confidence threshold to this.

Diagram showing instances of the name Beatrice Webb and how they might match.
Deduplication for ‘Beatrice Webb’

In the above example (which is made up, but reflects some of the variations for this particular name) four of the instances of the name have been matched, and so that creates a new primary record, with child records. Two of the instances have not been matched. We might link them in some way, hence the dotted line, or they might end up as entirely separate primary records. The instance of Beatrix Potter, nee Webb, has not been matched (these two individuals are often confused, especially as they have the same death date). If we set levels of confidence wrongly, this name could easily be matched to ‘Beatrice Webb’.

The reasoning behind this approach is that we aggregate where we can, but we have a model that works comfortably with the impossibility of matching all names. Ideally we provide end users with one name record for one person – a record that links to archive collections and other related resources. But we have to balance this against levels of confidence, and we have to be careful about creating false matches. Where we do create a match, the records that were previously primary records become ‘child records’ and they no longer display in the end user interface. This means we reduce the likelihood of the end user searching for ‘william churchill’ and getting 25 results. We aim for one result, linking to all relevant archives, but we may end up with two or three results for names that have many variations, which is still a vast improvement.

If we have several primary records for the same person (due to name variations) then it may be that new data we receive will help us create a match. This cannot be a static process; it has to be an effective ongoing workflow.

Comic strips and seaside holidays: unexpected stories from the Save the Children Archive

Archives Hub feature for September 2020

The Save the Children (SCF) archive, held at the Cadbury Research Library, University of Birmingham, charts the development of the charity from its creation in 1919. The collection includes a wealth of material relating to the charity’s founder, Eglantyne Jebb, and these papers provide a fascinating insight into how SCF operated during the 1920s. They also highlight the personal stories of individuals associated with SCF.

Concertina comic strips

Illustrated concertina comic strip (ref: SCF/EJ/9/2).

One fascinating item is a wonderful illustrated concertina comic strip created by Corinne de Candole, documenting her first week working at the SCF office in April 1925. She dedicated the strip to ‘Miss Jebb who showed me how the New World is being built at the Office of the Save the Children Fund’. The strip depicts Corinne’s interview with a Mrs Beach, as well as the making of blue cloaks and flags and ‘planning for the new world’.

Travelling to Geneva (ref: SCF/EJ/9/2).

Another two comic strips reveal how Corinne travelled to Geneva for the summer school in 1925 and she also wrote two poems about this experience: ‘The Disobedient Lady who never got to the SCF Summer School’ and ‘The Obedient Lady who went to the SCF Summer School’. Through these documents we can sense the pride with which Corinne felt for working for SCF and her thoughts on how it was helping change the world.

Thank you letters

The overseas country papers in the Eglantyne Jebb series highlight the personal stories of those affected by the crisis in Europe after the First World War. The Horak family, from Hungary, wrote a letter of appreciation to SCF, offering thanks and remembering their benefactors.

The Horak family letter with typed translation, 1922 (ref: SCF/EJ/1/17/1).

‘From the bottom of our hearts sending our Christmas Greetings and very best wishes [and] we are always thinking gratefully of those who helped to get homes for us poor war invalids and widows with our families. May you be as happy as you have made us […] The little cottage means also a new life to us, making us forget our sufferings and losses. We beg the Almighty to pour his blessing over you and your family and give long life and happiness to those who provided us with a home. This will be our prayer on this holy Christmas eve.’

The letter is accompanied by a photograph of the Horak family (ref: SCF/EJ/1/17/1).

In a letter to Miss Vulliamy, who was leading SCF funded projects in Poland, Vera Staack describes how her mother, and herself, had to flee Russia due to the Bolsheviks:  ‘But why are they frightened, why do I read such terror in their eyes? I shall explain you the reason. The red banner flashes, and on it the black words which make everybody tremble. “Death to the bourgeois.”…..The fathers or mothers are taken from their children, children are torn from their parents sides. And so everybody tries to hide quickly.’

‘The picture of the past rises involuntary before me. Christmas Eve! It was our last Christmas Eve in our native land-in far off Moscow. An enormous Christmas-tree made dazzlingly brilliant by quantities of electric lamps and brilliant ornaments and many, many presents…..And all this has been taken from me by the Bolsheviks. Dear Miss Vulliamy, and I shall have no more Christmas-trees or Christmas Eves, and mother is always very cross now, cries often, and wishes to speak to no one. She was quite different before.’

Letter from Vera Staack, 1921 (ref: SCF/EJ/1/22/7).

‘And now good-bye, my dear, dear English friend. I hug you very hard and remain your very respectful and unhappy little Domby friend

She ends ‘P.S. Why are men so wicked, dear Miss Vulliamy.’

A seaside holiday

Another example can be found in a report entitled ‘A seaside holiday’, written by M. Brown, where we learn of the impact that a trip to the beach had for a group of young children: ‘“Who pushes the sea?” Is water never still?” “Does sand bite?” […] even the Ukrainian student was among the unbelievers who doubted whether the sea was salt, and made a wild dash to stoop down and taste it to make quite sure that he was not being deceived.’

The children then share their stories of the horrors that they have been through: ‘that was a long time ago…my mama died in the truck on the way from Russia. She died of hunger my mama did not live long after my daddy was killed by the Bolshevists. I wouldn’t believe it at first when the doctor came round and bent down and listened to her heart and said that mam was dead.’

‘A seaside holiday’ report, 1922 (ref: SCF/EJ/1/22/8).

‘All the children have their own sad story, and all have lived through strange and dreadful times, and in all their young faces can be read the tragedy of the homeless and the outcast. It is to build up their energy for the life struggle before them that Miss Vulliamy inaugurated the Children’s Holiday Home at Danzig in 1922.’

These archives offer a glimpse into the traumatic events which children and families faced in the aftermath of the First World War, the attempts by SCF to help and the appreciation that this generated.

Matthew Goodwin
Save the Children Project Archivist
Cadbury Research Library, University of Birmingham

Related

Browse all Cadbury Research Library, University of Birmingham descriptions available to date on the Archives Hub.

All images copyright Cadbury Research Library, University of Birmingham. Reproduced with the kind permission of the copyright holders.

Names (5): The Problem of anonymity

cartoon of person asking 'who am I?'It is easy to focus on names that represent fairly well known people.  But one of the challenges for archives is to work with little known people – names that represent someone who is referenced in a catalogue – maybe they are indexed because they are a correspondent for example – they appear in one of a series of letters – but there is no more information about them other than their name. They may be referenced in other sources, but we have little to go on in order to discover that, and often they won’t be represented – it may be that this is the only written source that includes them.

In a names service, we can add a name – let’s say ‘Louisa Jane Justamond’ – a name from https://archiveshub.jisc.ac.uk/data/gb12-ms.add.8556 (‘The Garland continued’, a collection of poems addressed to her).  We only have that one instance of that name. It is not in VIAF, it is not in Wikidata. There is an instance listed in ‘A genealogical and heraldic dictionary of the landed gentry of Great Britain’ (a precursor to Burke’s peerage). But unless we decide to use that an external source, write a name matching algorithm and decide, on levels of confidence, that it is indeed a match, that is not going to help us.  We are left with a name attached to one archive collection and nothing else.

We can create a name record for Justamond, but if we display it on the Archives Hub it will simply show her name and a link back to the related description.  It will be extremely minimal.

However, what we don’t know is whether new collections will be added to the Archives Hub, or new information added to Wikidata or another source that we use, such that this person becomes more identifiable.   We simply don’t know what the value of a name might be.  In the future, having a record of this person could prove to be immensely useful in making a connection.

Archives have what you might call a long tail of names. It is something that characterises our holdings. It is something that sets us apart from libraries and museums, at least to a degree.  Most names represented in library holdings (or names they represent in their catalogues and other finding aids) represent identifiable people.

Graph showing the long tail of names
The long tail of names

In archives, we have collections that represent ordinary people, not published, not celebrated, not notorious, with no documented place in history. We also have collections that include people where it is hard to know whether an individual is more widely known, because the archive collection does not entirely identify them.

Either way, it leaves us with a question about how to deal with a name that has nothing else attached to it other than ‘this name is in this letter’.

Building an index of all names means that we have a store of data that can be used for further exploration. It could sit behind the scenes, but it can be used to try out tools, data manipulation and matching.  In other words, the data is a separate thing from what you decide to display.

Having a name (maybe not knowing exactly who the name represents) and knowing that the name is in three different archives has value.  We can say ‘in the absence of any other information, we assume these names represent the same person’, or we can simply present the information and not make any conclusions (although that begs the question of how you present it without encouraging assumptions).  It is then up to researchers to explore further.  We might find new data sources that help to clarify names. We might get new descriptions that help to do this.

Many archival descriptions include subjects and, to a lesser extent, places. If you have Stephen Merryweather in one, with an index term of botany, and S. Merryweather in another, with the same index term, then you could say it is more likely to be a match. There is a question of how you might then present that information. The use of algorithms raises the issue of how to convey levels of confidence. It feels as if we need to have a more sophisticated – and recognised – means of presenting levels of confidence.

This whole issue of confidence levels is more of a focus for archives, because of the anonymity I’ve talked about.

Diagram showing Relationships of data involved in creating name records
Relationships of data involved in creating name records

The ‘Name’ records shown above are the names within archival descriptions (EAD records on the Hub).  These names can be pulled out from ‘origination’ (creator) and from ‘persname’ (usually in the controlaccess index section, but potentially elsewhere in the description).  These names may represent ‘unknown’ people, the EAD may not even indicate whether they are personal or corporate or family names. They may not include dates, they may just be ‘Mary Fleming’ or ‘Mary Fleming fl 1717’.  They may also be ‘unknown’, ‘[unknown]’, or even ‘unknown unknown’ (keeping the surname, forename structure!).  They may be ‘Name of author (various)’ or ‘Various health authority bodies’ or ‘Possibly Miss M. Lindsay’. All these are examples from our data.  They illustrate the conflict between human readable data – where ‘unknown’ is useful – and machine processable data – where semantics are important, and a name is ideally just a name.

If we create ‘Name’ entries for all of these then we have a store of data to work with, something I’ve mentioned before in my Names Project blogs.  We can then find out how many ‘Mary Fleming’ entries there are, or  how many ‘M Fleming’ entries. How we then choose to display that information to end users is a separate question.  But with the advances in machine learning, it is becoming an increasingly pertinent question.

We have an opportunity with archival metadata, with the way that archives represent ‘ordinary life’. But it is a challenge Catalogues are still not really set up to identify entities (in a way that works for machine processing). We create what we refer to as ‘name authorities’ but we do not usually consider the importance of matching names outside of individual organisations. The Archives Hub has an opportunity to work on behalf of UK archives to try to draw out people and, in a sense, identify them, or at least, enable them to be more contextualised. But it will require a good deal of experimentation and expertise in working with disparate data.  However, if we create a pool of names and provide an API, that would enable others to work with the data, and try different approaches.  This is a big challenge, and it needs a concerted and collaborative approach.

 

picture of anonymous crowd

 

 

Fish are jumpin’ in the Archives

Archives Hub feature for August 2020

Summertime and the livin’ is easy...” ¹. Well, it’s a rather wet summer in the UK but all the better for exploring collections on the theme of fish!

Plotosus lineatus (Catfish). Copyright: Alain Feulvarch (https://commons.wikimedia.org/wiki/File:Catfish_Plotosus_lineatus.jpg). Creative Commons 2.0 license: https://creativecommons.org/licenses/by/2.0/deed.en
Plotosus lineatus (Catfish). Copyright: Alain Feulvarch (https://commons.wikimedia.org/wiki/File:Catfish_Plotosus_lineatus.jpg). Creative Commons 2.0 license.

We’ve trawled the Archives Hub (sorry, couldn’t resist!) to bring you a selection of the wonderful, and sometimes surprising, collections relating to fish, ranging across research, expeditions, fisheries, the fishing industry and river authorities – not forgetting a fish and chip shop, a theatre and several appropriately named individuals.

Research and Expeditions

Fishes Collected by Darwin, 1842. 300 pages of notes on the fish collected by Darwin on the Beagle, compiled by Leonard Jenyns (1800-1893), a clergyman and naturalist; Jenyns changed his name to Leonard Blomefield in 1871. Held by the Museum of Zoology Archives, University of Cambridge https://archiveshub.jisc.ac.uk/data/gb433-jenynsdarwin.

C Tate Regan collection, 1912-1913. Charles Tate Regan (born in 1878) was keeper of zoology at the British Museum. He worked on the scientific results of the Scottish National Antarctic Expedition, 1902-1904 (leader William Speirs Bruce) and the British Antarctic Expedition, 1910-1913 (leader Robert Falcon Scott). He died in 1948. Published work includes ‘Antarctic fishes of the Scottish National Antarctic Expedition’ in the Reports of the scientific results of the voyage of the steam yacht Scotia and ‘Fishes’ and ‘Larval and post larval fishes’ published in the zoology reports of the British Antarctic Expedition, 1910-1913. Held by the Scott Polar Research Institute Archives, University of Cambridge https://archiveshub.jisc.ac.uk/data/gb15-charlestateregan.

Cuthbertson drawing of an Atlantic lizardfish. Copyright the National Museums Scotland Library.
Cuthbertson drawing of an Atlantic lizardfish. Copyright the National Museums Scotland Library (adapted from the full image included in the William Speirs Bruce Archive feature, August 2017).

Winifred E. Frost collection, 1930s-1960s. Frost was an authority on the natural history of fish in the Lake District. Research includes work on euphausids with professor James Johnstone at Liverpool university and she worked for the fisheries branch at Dublin investigating trout in the River Lifey. She was appointed to the Freshwater Biological Association in 1938 and was awarded a D.S.c. by Liverpool University for her published papers. She wrote The Trout with Margaret E.Brown (Varley) published in 1967 that took 21 years to prepare. She was a member of the Council of the Salmon and trout association, and president of the Windermere and District angling association, also travelling to international scientific meetings and undertaking investigation of eels in Africa. Held by the Freshwater Biological Association Archives https://archiveshub.jisc.ac.uk/data/gb986-frow.

Notes towards a dictionary of fish names, by Paul Barbier (C20th). Barbier was Professor of French Language and Literature at the University of Leeds, 1903-1938. The collection comprises 8 boxes of notes prepared in the course of research for an unpublished dictionary of names of fishes. Held by University of Leeds Special Collections https://archiveshub.jisc.ac.uk/data/gb206-ms125.

Solenostomus paradoxus - Harlequin Ghost Pipefish. © Steve Childs (https://commons.wikimedia.org/wiki/File:Solenostomus_paradoxus_-_Harlequin_Ghost_Pipefish.jpg). Creative Commons 2.0 license https://creativecommons.org/licenses/by-sa/2.0/deed.en.
Solenostomus paradoxus – Harlequin Ghost Pipefish. © Steve Childs (https://commons.wikimedia.org/wiki/File:Solenostomus_paradoxus_-_Harlequin_Ghost_Pipefish.jpg). Creative Commons 2.0 license.

Rosemary Lowe-McConnell Collection, 1934-1947. Lowe-McConnell was a pioneer in tropical fish ecology. She was born in Liverpool, and graduated from the university. She worked at the Freshwater Biological Association studying the migration of silver eels. In 1993 Michael N. Bruton interviewed Lowe-Connell on the personal reasons behind her choice of work, and her personal influences, and experiences of being a woman in a male dominated world. Initially she wanted to be an explorer/naturalist, with the reply being ‘never mind dear, perhaps you can teach’.  When applying for the colonial services in 1945, to be an entomologist, they would not employ a female one, but the tropical fisheries department was new, and not considered as important. Despite her being forced to resign in 1954 when the marriage bar was in place, she was more interested in pursuing her findings than concerned with job status, and she believed that the fact she had been offered the directorship at the Joint Fisheries Research organisation in central Africa (which she rejected) showed her that she was accepted despite being female. Held by Freshwater Biological Association Archives
https://archiveshub.jisc.ac.uk/data/gb986-lowr.

Journal of John Walsh’s Visit to France in 1772. John Walsh (1726-1795) was elected to the Royal Society in 1770, and became known for his work on the electric ray, Torpedo marmorata. In 1769 Edward Banfield proved that the electric eel emitted electric shocks, and Walsh set out to confirm that the ray had a similar power. In this he was encouraged by Benjamin Franklin, whose American colleagues were undertaking similar investigations. With his nephew Arthur Fowkes he spent the summer of 1772 at La Rochelle, where the ray was often captured. The fish could survive many hours out of water, and Walsh was able to conduct experiments ashore and successfully proved that the ray’s shocks were caused by electricity. His findings were published in the Royal Society’s Philosophical Transactions, vol. 63 (1773), pp. 461-77, and the Royal Society awarded him the Copley medal for his achievement. Held by University of Manchester Library https://archiveshub.jisc.ac.uk/data/gb133-engms724.

Fisheries and the Fishing industry

Records of Aberdeen Fish Curers and Merchants Association, 1888-1947. The association was established in May 1888, as Aberdeen Fish Trade Association, and was incorporated with its present title in 1944. It began in response to the introduction of sales by auction in the late nineteenth century, its first achievement being an agreement amongst fish sellers to provide discounts for cash sales to accredited buyers. Membership was open to wholesale fish merchants and fish curers carrying on a business in Aberdeen, and in 1980 stood at more than 200. Held by University of Aberdeen Special Collections https://archiveshub.jisc.ac.uk/data/gb231-ms3054.

Records of the Berwick Salmon Fisheries Co Ltd, salmon fishers, Berwick upon Tweed, England, 1562-1964 (predominant 1860-1964). The Old Shipping Co, shipping traders and salmon fishers, Berwick-upon-Tweed, Northumberland, England, was established at some point prior to 1766 by a group of local men, mainly coopers, who held shares in a small sailing fleet engaged in the London, coastal and foreign trade. As commodities included salmon, the company leased fishing rights on the river Tweed. The shipping vessels were sold off in 1869 as business had become unprofitable and the company’s name changed to Berwick Salmon Fisheries Co Ltd in 1872. Held by University of Glasgow Archive Services  https://archiveshub.jisc.ac.uk/data/gb248-ugd245.

Volume containing two copies of a printed register relating to Netherlands herring fisheries, 1749: entitled Naamlyst der boekhouders, schepen, en stuurluiden van de haring-shepen, in’t Yaar 1749, van Enchisen en de Ryp, ter haring-shepen uitgevaren (Jan von Guissen, Enkhuisen, 1749), giving details of the ships, owners and captains of the fleets of Enkhuisen and De Rijp. Added in manuscript are details of the total catch for 1749, and the catch for individual ships on various voyages. Held by Senate House Library Archives, University of London 
https://archiveshub.jisc.ac.uk/data/gb96-ms115.

Women Fish Sellers - from Hamilton, Robert (1866) British Fishes, Part II, Naturalist's Library, vol. 37, London: Chatto and Windus. Image in the public domain (photograph from the Freshwater and Marine Image Bank at the University of Washington).
Women Fish Sellers – from Hamilton, Robert (1866) British Fishes, Part II, Naturalist’s Library, vol. 37, London: Chatto and Windus. Image in the public domain (photograph from the Freshwater and Marine Image Bank at the University of Washington).

Grimsby Steam and Diesel Fishing Vessels’ Engineers’ and Firemen’s Union, 1897-1987. The Grimsby Steam Fishing Vessels’ Engineers’ and Firemen’s Union was founded in 1896. It changed its name to the Grimsby Steam and Diesel Fishing Vessels’ Engineers’ and Firemen’s Union in 1961. In 1976 it transferred engagements to the Transport and General Workers’ Union, becoming 10/3c Branch. Held by Modern Records Centre, University of Warwick https://archiveshub.jisc.ac.uk/data/gb152-gsf.

The business records of Shippam’s Ltd, 1853-1995. The Shippam’s business first started in 1786, when Charles Shippam established a grocery store in Westgate, Chichester. In 1886 they began food manufacturing and in 1894 launched a wide range of potted meat and fish pastes, for which Shippam’s was to become internationally famous. Held by West Sussex Record Office https://archiveshub.jisc.ac.uk/data/gb182-shippam’s.

Fish and Chips

Fish and chips on the seafront at Hunstanton, Norfolk UK (in this instance the fish is deep fried plaice). © Andrew Dunn, http://www.andrewdunnphoto.com/. Creative license https://creativecommons.org/licenses/by-sa/2.0/deed.en.
Fish and chips on the seafront at Hunstanton, Norfolk UK (in this instance the fish is deep fried plaice). © Andrew Dunn, http://www.andrewdunnphoto.com/. Creative Commons  2.0 license.

Records of Pesci Bros Fish and Chip Shop, 1920-1994. The Pesci family, originally from Bardi in Italy, came to Barking from Wales in 1934, and went on to open a fish and chip shop at 15 Broadway. Only a few years later the shop was compulsorily purchased by Barking Borough Council so that the site could be used for the building of the new Town Hall. After a long search for a new premises, the family finally re-opened at 26 Ripple Road in 1939. The business flourished for nearly 60 years. Held by Barking and Dagenham Archive and Local Studies Centre https://archiveshub.jisc.ac.uk/data/gb350-bd76.

River authorities

Records of the Centre for Environment, Fisheries and Aquaculture Science, Benarth Road, Conwy, 1916-1994. In December 1999 the Conwy Laboratory closed after approximately ninety years of pioneering research and development into fish and shellfish aquaculture. The laboratory’s foundation came about following the building of mussel purification tanks by Conwy Corporation in 1913, in an attempt to improve the quality of Conwy mussels, which had been at the centre of several serious infections. The collection is of scientific importance in documenting experiments of international significance. Additionally, it reflects the traditional activities of the mussel fishermen themselves. Held by Gwasanaeth Archifau Conwy / Conwy Archive Service https://archiveshub.jisc.ac.uk/data/gb2008-cd3.

Environment Agency Collection, 1786-2010. The collection consists of reports, surveys, data records, maps, administrative records and other material relating to the work of the Environment Agency (and of its predecessor organisations the various River Boards, River Authorities, Water Authorities and the National Rivers Authority). A few documents date back to the 19th century and earlier, the majority spans the 1930s to the 1990s. Most of the collection relates to the Agency’s monitoring and management of the area’s river and lake catchments, with an emphasis on fisheries, biodiversity, constructions such as fish passes, weirs and fish traps, fish diseases, water quality and pollution. Included are papers relating to the Agency’s corporate, strategy and public affairs, as well as information on regional and national byelaws, net limitation orders and historic fishery rights. Held by Freshwater Biological Association Archives https://archiveshub.jisc.ac.uk/data/gb986-enva.

A Different Kettle of Fish

Records relating to Ada Fish, First World War munitions worker at Pembrey, 1918-1919. Held by West Glamorgan Archive Service https://archiveshub.jisc.ac.uk/data/gb216-d/dz969.

Fisher Theatre, Bungay, 1790-1886. The Fisher theatre at Bungay, Suffolk, opened in February 1828. Built by David Fisher I, the theatre was one of a dozen serving the circuit of Fisher’s company, The Norfolk and Suffolk Company of Comedians and seasons of performances were produced on a two-year cycle. The theatre was sold by the Fishers in 1844 and was used subsequently as a corn hall, furniture store, steam laundry, cinema, and textile warehouse. In 2000 the building was acquired by the Bungay Arts Trust. After extensive renovations the building was re-opened in 2006 as a community theatre and arts centre which is also licensed for wedding and civil ceremonies. Held by the University of East Anglia Archives https://archiveshub.jisc.ac.uk/data/gb1187-ftb.

Papers of Robert Salmon Hutton, 1897-1970. Hutton was born in 1876 in London. His family owned a silversmiths in Sheffield. Hutton pursued his research interests in electro-metallurgy with Professor Arthur Schuster at Manchester and Henri Moissan in Paris. From 1900-1908 he was a lecturer in electro-chemistry at the University of Manchester, where he carried out pioneering work on electric furnace technology, seeing its value for commercial metallurgy. In 1903 he perfected a method for the mass production of fused silica. Hutton had a great interest in research and development, and he was aware of failings in this area by British metallurgical industries. A great believer in the value of technical libraries, he was a founder of the Association of Scientific Libraries Information Bureau (ASLIB) in 1924. Held by University of Manchester Library https://archiveshub.jisc.ac.uk/data/gb133-hut.

Engraving of Anthias Anthias at that time called Anthias Sacer. The Author ran out of resources while issuing this book and therefore every engraving had its own sponsor. This one has been sponsored by Sigmund Zois Freiherr von Edelstein. Author: Bloch, Marcus Elieser, 1723-1799. Item/Page/Plate: Pl. 315, opp. p. 86. Image in the Public Domain(https://creativecommons.org/publicdomain/mark/1.0/deed.en; PD-US), courtesy of The New York Public Library, www.nypl.org.
Engraving of Anthias Anthias at that time called Anthias Sacer. The Author ran out of resources while issuing this book and therefore every engraving had its own sponsor. This one has been sponsored by Sigmund Zois Freiherr von Edelstein. Author: Bloch, Marcus Elieser, 1723-1799. Item/Page/Plate: Pl. 315, opp. p. 86. Image in the Public Domain (https://creativecommons.org/publicdomain/mark/1.0/deed.en; PD-US, courtesy of The New York Public Library).

Herring, Thomas (1693-1757). Papers of Thomas Herring, Archbishop of Canterbury 1747-57. 4 volumes, held by Lambeth Palace Library https://archiveshub.jisc.ac.uk/data/gb109-herring.

Papers of George Gordon Hake, 1891-1904. Hake was born in 1847. He spent thirteen years from 1891 working in South Africa, initially with the British South Africa Company and later with the Tanganyika Telegraph Service during 1889 and 1903 in the Mashonaland area. He died in 1903 and was buried at Port Herald. Hake was closely connected to the Rossetti family in their later years, acting as a ‘minder’ to Dante Gabriel Rossetti during one of their family holidays. Christina Rossetti was also godmother to his daughter Ursula. Held by School of Oriental and African Studies (SOAS) Archives, University of London https://archiveshub.jisc.ac.uk/data/gb102-ppms40.

Henry Guppy (1861-1948) was librarian of the John Rylands Library from 1900-1948. Held by University of Manchester Library https://archiveshub.jisc.ac.uk/data/gb133-tft/tft/1/459.

Declaration of Trust of Leasehold Property in Breams Buildings, Chancery Lane, London, 1888. Lease for the Breams Building, which was the main Birkbeck site from 1888-1952. The lease is in the form of a soft cover book, written over several velum pages, with wax seals on the last page. Held by Birkbeck Library Archives and Special Collections, University of London https://archiveshub.jisc.ac.uk/data/gb1832-bbk/bbk/6/1.

John Whiting Archive, 1917-1963. Whiting, a playwright and actor, was born in 1917 Salisbury, UK. He received his education at Taunton School and then later trained as an actor at Royal Academy of Dramatic Art. After his time in the army Whiting had some success as an actor and then went onto write numerous plays, short stories and plays for radio. Whiting also took up theatre criticism during the last few years of his life for ‘London Magazine’, some of his work can be found in the ‘The Art of Dramatist’ (1970). Held by V&A Theatre and Performance Collections https://archiveshub.jisc.ac.uk/data/gb71-thm/222.

Roe Manuscripts, 10th-17th century. Sir Thomas Roe was born in 1580 or 1581, and matriculated at Magdalen College, Oxford, in 1593, but took no degree. In 1605 he was knighted, and in 1614 began his official journeys to the East which made him famous. From that year to 1618 he was Ambassador to Jehngr, the Mogul emperor of Hindustan, and from 1621 to 1628 to the Turkish Court. In 1640 Roe was elected a burgess of the University in Parliament, and died in 1644. The manuscript collection comprises:  27 Greek, one Hebrew, one Arabic, and one Latin. Held by the Bodleian Library, University of Oxford https://archiveshub.jisc.ac.uk/data/gb161-mss.roe1-17,18a-b,19-29.

A "tornado" of schooling barracudas at Sanganeb Reef, Sudan. Copyright: Robin Hughes (https://commons.wikimedia.org/wiki/File:Barracuda_Tornado.jpg). Creative Commons 2.0 license: https://creativecommons.org/licenses/by-sa/2.0/deed.en.
A “tornado” of schooling barracudas at Sanganeb Reef, Sudan. Copyright: Robin Hughes (https://commons.wikimedia.org/wiki/File:Barracuda_Tornado.jpg). Creative Commons 2.0 license.

Rocket assisted take off by a Barracuda, 1945 – on HMS Trumpeter. 2 photos, held by Gwasanaeth Archifau Conwy / Conwy Archive Service https://archiveshub.jisc.ac.uk/data/gb2008-cp1727/cp1727/4/1/40.

Previous features relating to Fish

Silt, sluices and smelt fishing – The Eau Brink Cut and the Bedford Level Corporation Archive

Silt, sluices and smelt fishing – The Eau Brink Cut and the Bedford Level Corporation Archive

William Speirs Bruce Archive in the National Museums Scotland Library

William Speirs Bruce Archive in the National Museums Scotland Library

1. George Gershwin – Summertime lyrics: https://www.stlyrics.com/songs/g/georgegershwin8836/summertime299720.html

Names (4): Ethics and identity

As archivists, we deal with ethical issues a good deal.  But the ability to link disparate and diverse data sources opens up new challenges in this area, and I wanted to explore this a bit.

If you do a general search for ethics and data, top of the list comes health. An interesting example of data join-up is the move to link health data to census data, which could potentially highlight where health needs are not being met:

“Health services are required to demonstrate that they are meeting the needs of ethnic minority populations. This is difficult, because routine data on health rarely include reliable data on ethnicity. But data on ethnicity are included in census returns, and if health and census data for the same individuals can be linked, the problem might be solved.” (Ethnicity and the ethics of data linkage)

However, individuals who stated their ethnicity in census returns were not told that this might subsequently be linked with their health data. Should explicit informed consent be given? Given the potential benefits, is this a reasonable ask? It is certainly getting into hazardous terrain to ignore the principle of informed consent. In their book ‘Rethinking Informed Consent in Bioethics‘, Manson and O’Neill argue that informed consent cannot be fully specific or fully explicit. They argue for a distinctive approach where rights can be waived or set aside in controlled and specific ways.

This leads to a wider question, is fully explicit and specific informed consent actually achievable within the joined-up online world? A world where data travels across connections, is blended, re-mixed, re-purposed. A world where APIs allow data to be accessed and utilised for all sorts of purposes, and ‘open data’ has become a rallying cry.  Is there a need to engage the public more fully in order to gain public confidence in what open data really means, and in order to debate what ‘informed consent’ is, and where it is really required?

I am working on a project to create name records, and I am looking at bringing data sources together. Of course, this is hardly new. Wikipedia is the most well-known hub for biographical data. Anyone can add anything to a Wikipedia page (within some limits, and with some policing and editing by Wikipedia, but in essence it is an open database).  Wikidata, which underlies Wikipedia, is about bringing sources together in an automated way.  Projects within cultural heritage are also working on linked data approaches to create rich sources of information on people. SNAC has taken archival data from many different archive repositories and brought it together. A page for one person, such as Martin Luther-King provides a whole host of associations and links. These sources are not all individually checked and verified, because this kind of work has to be done algorithmically. However, there is a great deal of provenance information, so that all sources used are clear.

image of page from the face of white australia website
The Face of White Australia

There are some amazing projects working to reveal hidden histories. Tim Sherratt has done some brilliant work with Australian records. Projects such as Invisible Australians, which aims to reveal hidden lives, using biographical information found in the records. He has helped to create some wonderful sites that reveal histories that have been marginalised.  Tim talks about ‘hacking heritage’ and says: ‘By manipulating the contexts of cultural heritage collections we can start to see their limits and biases. By hacking heritage we can move beyond search interfaces and image galleries to develop an understanding of what’s missing.’ (Hacking heritage, blog post)  He emphasises that access to indigenous cultural collections should be subject to community consultation and control.  But what does community consultation and control really mean?

I have always been keen to work with the names in archival descriptions – archival creators and all the other people who are associated with a collection. They are listed in the catalogue (leastways the names that we can work with are listed – many names obviously aren’t included, but that’s another story), so they are already publicly declared. It is not a case of whether the name should be made public at all, or, at least, that decision has been made already by the cataloguer.   But our plan is to take the names and bring them to the fore – to give them their own existence within our service.  We are taking them out of the context of a single archive collection and putting them into a broader one. In so doing, we want to give the archive collections themselves more social context, we want to give more effective access to distributed historical records, and we also want to enable researchers to travel through connections to create their own narratives.

This may help to reveal things about our history and highlight the roles that people have played. It may bring people to the fore people who have been marginalised.  Of course, it does not address the problem of biases and subjective approaches to accessions and cataloguing. But a joined-up approach may help us to see those biases and gaps; to understand more about the silent spaces.

Creating persistent identifiers and linking data reveals knowledge. It is temping to see that in simple terms as a good thing.  But what about privacy and ethics?  Even if someone is no longer living, there are still privacy issues, and many people represented in archives are alive.

Do individuals want to be persistently identified? What about if they change their identity? Do they want a pseudonym associated with their real name? They might have very good reasons for keeping their identity private. Persistent identification encourages openness and transparency, which can have real benefits, but it is not always benign.  It is like any information – it can be used for good and bad purposes, and who is to say what is good and what is not? Obviously we have GDPR and the Data Protection Act, and these have a good deal to say about obligations, the value of historical research and the right to be forgotten. This is something we’ll need to take into account. But linked data principles are not so much about working with personal data as working with data that may not seem personal, but that can help to reveal things when linked with other sources of data.

GDPR supports the principle of transparency and the importance of people’s awareness and control over what happens to their personal data. Even if we are not creating and storing personal data, it seems important to engage with data protection and what this means. The challenge of how to think about data when it is part of an ever shifting and growing  global data environment seems to me to be a huge one.

Certainly the horse has bolted to some degree with regards to joining up data. The Web lowered barriers considerably, and now we increasingly have structured data, so it is somewhat like one gigantic database. Finding things out about individuals is entirely feasible with or without something like a Names service created by the Archives Hub. We are not creating any new content, but creating this interface means we are consciously bringing data together, and obviously we want to be responsible, and respect people’s right to privacy. Clearly it is entirely impractical to try to get permission from all those living people who might be included. So, in the end, we are taking a degree of risk with privacy.  Of course, we will un-publish on request, and engage with any feedback and concerns. But at present we are taking the view that the advantages and benefits outweigh the risks.

 

Image of exhibition photograph of black rights march

“Imagine being a sibling in a family that continually removes you from photos; tries its best to erase you…As you go through [the scrapbook] you see events where you know you were there, but you are still missing.”  Lae’l Hughes-Watkins (University of Maryland) gave an impassioned and inspiring talk at DCDC 2019 about her experiences.  She argued that archivists need to interrogate the reality that has been presented, and accept that our ideas of neutrality are misplaced. She wants a history that actively represents her – her history and culture, and experiences as a black woman in the USA. She related moving stories of people with amazing stories (and amazing archives) who distrust cultural institutions because they don’t feel included or represented.

This may seem a long way away from our small project to create name records, but in reality our project could be seen as one very small part of a move towards what Lae’l is talking about.  Bringing descriptions together from across the UK together maybe helps us to play a small role in this – aiming to move towards documenting the full breadth of human experience. The archives that we cover may retain the biases and gaps for some time to come (probably for ever, given that documentary evidence tends to represent the powerful and the elite much more strongly), but by aggregating and creating connections with other sources, we help to paint a bigger picture.  By creating name records we help to contextualise people, making it much easier to bring other lives and events into the picture. It is a move towards recognising the limitation of what is actually in the archive, and reaching out to take advantage of what is on the Web.  In doing this through explicitly identifying people we do leave ourselves more open to the dangers of not respecting privacy or anonymity. When we plug fully into the Web, we become a part of its infinite possibilities, which is always going to be a revealing, exciting, uncontrollable and risky business. By allowing others to use this data in different ways, we open it up to diverse perspectives and uses.