Names (5): The Problem of anonymity

cartoon of person asking 'who am I?'It is easy to focus on names that represent fairly well known people.  But one of the challenges for archives is to work with little known people – names that represent someone who is referenced in a catalogue – maybe they are indexed because they are a correspondent for example – they appear in one of a series of letters – but there is no more information about them other than their name. They may be referenced in other sources, but we have little to go on in order to discover that, and often they won’t be represented – it may be that this is the only written source that includes them.

In a names service, we can add a name – let’s say ‘Louisa Jane Justamond’ – a name from https://archiveshub.jisc.ac.uk/data/gb12-ms.add.8556 (‘The Garland continued’, a collection of poems addressed to her).  We only have that one instance of that name. It is not in VIAF, it is not in Wikidata. There is an instance listed in ‘A genealogical and heraldic dictionary of the landed gentry of Great Britain’ (a precursor to Burke’s peerage). But unless we decide to use that an external source, write a name matching algorithm and decide, on levels of confidence, that it is indeed a match, that is not going to help us.  We are left with a name attached to one archive collection and nothing else.

We can create a name record for Justamond, but if we display it on the Archives Hub it will simply show her name and a link back to the related description.  It will be extremely minimal.

However, what we don’t know is whether new collections will be added to the Archives Hub, or new information added to Wikidata or another source that we use, such that this person becomes more identifiable.   We simply don’t know what the value of a name might be.  In the future, having a record of this person could prove to be immensely useful in making a connection.

Archives have what you might call a long tail of names. It is something that characterises our holdings. It is something that sets us apart from libraries and museums, at least to a degree.  Most names represented in library holdings (or names they represent in their catalogues and other finding aids) represent identifiable people.

Graph showing the long tail of names
The long tail of names

In archives, we have collections that represent ordinary people, not published, not celebrated, not notorious, with no documented place in history. We also have collections that include people where it is hard to know whether an individual is more widely known, because the archive collection does not entirely identify them.

Either way, it leaves us with a question about how to deal with a name that has nothing else attached to it other than ‘this name is in this letter’.

Building an index of all names means that we have a store of data that can be used for further exploration. It could sit behind the scenes, but it can be used to try out tools, data manipulation and matching.  In other words, the data is a separate thing from what you decide to display.

Having a name (maybe not knowing exactly who the name represents) and knowing that the name is in three different archives has value.  We can say ‘in the absence of any other information, we assume these names represent the same person’, or we can simply present the information and not make any conclusions (although that begs the question of how you present it without encouraging assumptions).  It is then up to researchers to explore further.  We might find new data sources that help to clarify names. We might get new descriptions that help to do this.

Many archival descriptions include subjects and, to a lesser extent, places. If you have Stephen Merryweather in one, with an index term of botany, and S. Merryweather in another, with the same index term, then you could say it is more likely to be a match. There is a question of how you might then present that information. The use of algorithms raises the issue of how to convey levels of confidence. It feels as if we need to have a more sophisticated – and recognised – means of presenting levels of confidence.

This whole issue of confidence levels is more of a focus for archives, because of the anonymity I’ve talked about.

Diagram showing Relationships of data involved in creating name records
Relationships of data involved in creating name records

The ‘Name’ records shown above are the names within archival descriptions (EAD records on the Hub).  These names can be pulled out from ‘origination’ (creator) and from ‘persname’ (usually in the controlaccess index section, but potentially elsewhere in the description).  These names may represent ‘unknown’ people, the EAD may not even indicate whether they are personal or corporate or family names. They may not include dates, they may just be ‘Mary Fleming’ or ‘Mary Fleming fl 1717’.  They may also be ‘unknown’, ‘[unknown]’, or even ‘unknown unknown’ (keeping the surname, forename structure!).  They may be ‘Name of author (various)’ or ‘Various health authority bodies’ or ‘Possibly Miss M. Lindsay’. All these are examples from our data.  They illustrate the conflict between human readable data – where ‘unknown’ is useful – and machine processable data – where semantics are important, and a name is ideally just a name.

If we create ‘Name’ entries for all of these then we have a store of data to work with, something I’ve mentioned before in my Names Project blogs.  We can then find out how many ‘Mary Fleming’ entries there are, or  how many ‘M Fleming’ entries. How we then choose to display that information to end users is a separate question.  But with the advances in machine learning, it is becoming an increasingly pertinent question.

We have an opportunity with archival metadata, with the way that archives represent ‘ordinary life’. But it is a challenge Catalogues are still not really set up to identify entities (in a way that works for machine processing). We create what we refer to as ‘name authorities’ but we do not usually consider the importance of matching names outside of individual organisations. The Archives Hub has an opportunity to work on behalf of UK archives to try to draw out people and, in a sense, identify them, or at least, enable them to be more contextualised. But it will require a good deal of experimentation and expertise in working with disparate data.  However, if we create a pool of names and provide an API, that would enable others to work with the data, and try different approaches.  This is a big challenge, and it needs a concerted and collaborative approach.

 

picture of anonymous crowd

 

 

Fish are jumpin’ in the Archives

Archives Hub feature for August 2020

Summertime and the livin’ is easy...” ¹. Well, it’s a rather wet summer in the UK but all the better for exploring collections on the theme of fish!

Plotosus lineatus (Catfish). Copyright: Alain Feulvarch (https://commons.wikimedia.org/wiki/File:Catfish_Plotosus_lineatus.jpg). Creative Commons 2.0 license: https://creativecommons.org/licenses/by/2.0/deed.en
Plotosus lineatus (Catfish). Copyright: Alain Feulvarch (https://commons.wikimedia.org/wiki/File:Catfish_Plotosus_lineatus.jpg). Creative Commons 2.0 license.

We’ve trawled the Archives Hub (sorry, couldn’t resist!) to bring you a selection of the wonderful, and sometimes surprising, collections relating to fish, ranging across research, expeditions, fisheries, the fishing industry and river authorities – not forgetting a fish and chip shop, a theatre and several appropriately named individuals.

Research and Expeditions

Fishes Collected by Darwin, 1842. 300 pages of notes on the fish collected by Darwin on the Beagle, compiled by Leonard Jenyns (1800-1893), a clergyman and naturalist; Jenyns changed his name to Leonard Blomefield in 1871. Held by the Museum of Zoology Archives, University of Cambridge https://archiveshub.jisc.ac.uk/data/gb433-jenynsdarwin.

C Tate Regan collection, 1912-1913. Charles Tate Regan (born in 1878) was keeper of zoology at the British Museum. He worked on the scientific results of the Scottish National Antarctic Expedition, 1902-1904 (leader William Speirs Bruce) and the British Antarctic Expedition, 1910-1913 (leader Robert Falcon Scott). He died in 1948. Published work includes ‘Antarctic fishes of the Scottish National Antarctic Expedition’ in the Reports of the scientific results of the voyage of the steam yacht Scotia and ‘Fishes’ and ‘Larval and post larval fishes’ published in the zoology reports of the British Antarctic Expedition, 1910-1913. Held by the Scott Polar Research Institute Archives, University of Cambridge https://archiveshub.jisc.ac.uk/data/gb15-charlestateregan.

Cuthbertson drawing of an Atlantic lizardfish. Copyright the National Museums Scotland Library.
Cuthbertson drawing of an Atlantic lizardfish. Copyright the National Museums Scotland Library (adapted from the full image included in the William Speirs Bruce Archive feature, August 2017).

Winifred E. Frost collection, 1930s-1960s. Frost was an authority on the natural history of fish in the Lake District. Research includes work on euphausids with professor James Johnstone at Liverpool university and she worked for the fisheries branch at Dublin investigating trout in the River Lifey. She was appointed to the Freshwater Biological Association in 1938 and was awarded a D.S.c. by Liverpool University for her published papers. She wrote The Trout with Margaret E.Brown (Varley) published in 1967 that took 21 years to prepare. She was a member of the Council of the Salmon and trout association, and president of the Windermere and District angling association, also travelling to international scientific meetings and undertaking investigation of eels in Africa. Held by the Freshwater Biological Association Archives https://archiveshub.jisc.ac.uk/data/gb986-frow.

Notes towards a dictionary of fish names, by Paul Barbier (C20th). Barbier was Professor of French Language and Literature at the University of Leeds, 1903-1938. The collection comprises 8 boxes of notes prepared in the course of research for an unpublished dictionary of names of fishes. Held by University of Leeds Special Collections https://archiveshub.jisc.ac.uk/data/gb206-ms125.

Solenostomus paradoxus - Harlequin Ghost Pipefish. © Steve Childs (https://commons.wikimedia.org/wiki/File:Solenostomus_paradoxus_-_Harlequin_Ghost_Pipefish.jpg). Creative Commons 2.0 license https://creativecommons.org/licenses/by-sa/2.0/deed.en.
Solenostomus paradoxus – Harlequin Ghost Pipefish. © Steve Childs (https://commons.wikimedia.org/wiki/File:Solenostomus_paradoxus_-_Harlequin_Ghost_Pipefish.jpg). Creative Commons 2.0 license.

Rosemary Lowe-McConnell Collection, 1934-1947. Lowe-McConnell was a pioneer in tropical fish ecology. She was born in Liverpool, and graduated from the university. She worked at the Freshwater Biological Association studying the migration of silver eels. In 1993 Michael N. Bruton interviewed Lowe-Connell on the personal reasons behind her choice of work, and her personal influences, and experiences of being a woman in a male dominated world. Initially she wanted to be an explorer/naturalist, with the reply being ‘never mind dear, perhaps you can teach’.  When applying for the colonial services in 1945, to be an entomologist, they would not employ a female one, but the tropical fisheries department was new, and not considered as important. Despite her being forced to resign in 1954 when the marriage bar was in place, she was more interested in pursuing her findings than concerned with job status, and she believed that the fact she had been offered the directorship at the Joint Fisheries Research organisation in central Africa (which she rejected) showed her that she was accepted despite being female. Held by Freshwater Biological Association Archives
https://archiveshub.jisc.ac.uk/data/gb986-lowr.

Journal of John Walsh’s Visit to France in 1772. John Walsh (1726-1795) was elected to the Royal Society in 1770, and became known for his work on the electric ray, Torpedo marmorata. In 1769 Edward Banfield proved that the electric eel emitted electric shocks, and Walsh set out to confirm that the ray had a similar power. In this he was encouraged by Benjamin Franklin, whose American colleagues were undertaking similar investigations. With his nephew Arthur Fowkes he spent the summer of 1772 at La Rochelle, where the ray was often captured. The fish could survive many hours out of water, and Walsh was able to conduct experiments ashore and successfully proved that the ray’s shocks were caused by electricity. His findings were published in the Royal Society’s Philosophical Transactions, vol. 63 (1773), pp. 461-77, and the Royal Society awarded him the Copley medal for his achievement. Held by University of Manchester Library https://archiveshub.jisc.ac.uk/data/gb133-engms724.

Fisheries and the Fishing industry

Records of Aberdeen Fish Curers and Merchants Association, 1888-1947. The association was established in May 1888, as Aberdeen Fish Trade Association, and was incorporated with its present title in 1944. It began in response to the introduction of sales by auction in the late nineteenth century, its first achievement being an agreement amongst fish sellers to provide discounts for cash sales to accredited buyers. Membership was open to wholesale fish merchants and fish curers carrying on a business in Aberdeen, and in 1980 stood at more than 200. Held by University of Aberdeen Special Collections https://archiveshub.jisc.ac.uk/data/gb231-ms3054.

Records of the Berwick Salmon Fisheries Co Ltd, salmon fishers, Berwick upon Tweed, England, 1562-1964 (predominant 1860-1964). The Old Shipping Co, shipping traders and salmon fishers, Berwick-upon-Tweed, Northumberland, England, was established at some point prior to 1766 by a group of local men, mainly coopers, who held shares in a small sailing fleet engaged in the London, coastal and foreign trade. As commodities included salmon, the company leased fishing rights on the river Tweed. The shipping vessels were sold off in 1869 as business had become unprofitable and the company’s name changed to Berwick Salmon Fisheries Co Ltd in 1872. Held by University of Glasgow Archive Services  https://archiveshub.jisc.ac.uk/data/gb248-ugd245.

Volume containing two copies of a printed register relating to Netherlands herring fisheries, 1749: entitled Naamlyst der boekhouders, schepen, en stuurluiden van de haring-shepen, in’t Yaar 1749, van Enchisen en de Ryp, ter haring-shepen uitgevaren (Jan von Guissen, Enkhuisen, 1749), giving details of the ships, owners and captains of the fleets of Enkhuisen and De Rijp. Added in manuscript are details of the total catch for 1749, and the catch for individual ships on various voyages. Held by Senate House Library Archives, University of London 
https://archiveshub.jisc.ac.uk/data/gb96-ms115.

Women Fish Sellers – from Hamilton, Robert (1866) British Fishes, Part II, Naturalist’s Library, vol. 37, London: Chatto and Windus. Image in the public domain (photograph from the Freshwater and Marine Image Bank at the University of Washington).

Grimsby Steam and Diesel Fishing Vessels’ Engineers’ and Firemen’s Union, 1897-1987. The Grimsby Steam Fishing Vessels’ Engineers’ and Firemen’s Union was founded in 1896. It changed its name to the Grimsby Steam and Diesel Fishing Vessels’ Engineers’ and Firemen’s Union in 1961. In 1976 it transferred engagements to the Transport and General Workers’ Union, becoming 10/3c Branch. Held by Modern Records Centre, University of Warwick https://archiveshub.jisc.ac.uk/data/gb152-gsf.

The business records of Shippam’s Ltd, 1853-1995. The Shippam’s business first started in 1786, when Charles Shippam established a grocery store in Westgate, Chichester. In 1886 they began food manufacturing and in 1894 launched a wide range of potted meat and fish pastes, for which Shippam’s was to become internationally famous. Held by West Sussex Record Office https://archiveshub.jisc.ac.uk/data/gb182-shippam’s.

Fish and Chips

Fish and chips on the seafront at Hunstanton, Norfolk UK (in this instance the fish is deep fried plaice). © Andrew Dunn, http://www.andrewdunnphoto.com/. Creative license https://creativecommons.org/licenses/by-sa/2.0/deed.en.
Fish and chips on the seafront at Hunstanton, Norfolk UK (in this instance the fish is deep fried plaice). © Andrew Dunn, http://www.andrewdunnphoto.com/. Creative Commons  2.0 license.

Records of Pesci Bros Fish and Chip Shop, 1920-1994. The Pesci family, originally from Bardi in Italy, came to Barking from Wales in 1934, and went on to open a fish and chip shop at 15 Broadway. Only a few years later the shop was compulsorily purchased by Barking Borough Council so that the site could be used for the building of the new Town Hall. After a long search for a new premises, the family finally re-opened at 26 Ripple Road in 1939. The business flourished for nearly 60 years. Held by Barking and Dagenham Archive and Local Studies Centre https://archiveshub.jisc.ac.uk/data/gb350-bd76.

River authorities

Records of the Centre for Environment, Fisheries and Aquaculture Science, Benarth Road, Conwy, 1916-1994. In December 1999 the Conwy Laboratory closed after approximately ninety years of pioneering research and development into fish and shellfish aquaculture. The laboratory’s foundation came about following the building of mussel purification tanks by Conwy Corporation in 1913, in an attempt to improve the quality of Conwy mussels, which had been at the centre of several serious infections. The collection is of scientific importance in documenting experiments of international significance. Additionally, it reflects the traditional activities of the mussel fishermen themselves. Held by Gwasanaeth Archifau Conwy / Conwy Archive Service https://archiveshub.jisc.ac.uk/data/gb2008-cd3.

Environment Agency Collection, 1786-2010. The collection consists of reports, surveys, data records, maps, administrative records and other material relating to the work of the Environment Agency (and of its predecessor organisations the various River Boards, River Authorities, Water Authorities and the National Rivers Authority). A few documents date back to the 19th century and earlier, the majority spans the 1930s to the 1990s. Most of the collection relates to the Agency’s monitoring and management of the area’s river and lake catchments, with an emphasis on fisheries, biodiversity, constructions such as fish passes, weirs and fish traps, fish diseases, water quality and pollution. Included are papers relating to the Agency’s corporate, strategy and public affairs, as well as information on regional and national byelaws, net limitation orders and historic fishery rights. Held by Freshwater Biological Association Archives https://archiveshub.jisc.ac.uk/data/gb986-enva.

A Different Kettle of Fish

Records relating to Ada Fish, First World War munitions worker at Pembrey, 1918-1919. Held by West Glamorgan Archive Service https://archiveshub.jisc.ac.uk/data/gb216-d/dz969.

Fisher Theatre, Bungay, 1790-1886. The Fisher theatre at Bungay, Suffolk, opened in February 1828. Built by David Fisher I, the theatre was one of a dozen serving the circuit of Fisher’s company, The Norfolk and Suffolk Company of Comedians and seasons of performances were produced on a two-year cycle. The theatre was sold by the Fishers in 1844 and was used subsequently as a corn hall, furniture store, steam laundry, cinema, and textile warehouse. In 2000 the building was acquired by the Bungay Arts Trust. After extensive renovations the building was re-opened in 2006 as a community theatre and arts centre which is also licensed for wedding and civil ceremonies. Held by the University of East Anglia Archives https://archiveshub.jisc.ac.uk/data/gb1187-ftb.

Papers of Robert Salmon Hutton, 1897-1970. Hutton was born in 1876 in London. His family owned a silversmiths in Sheffield. Hutton pursued his research interests in electro-metallurgy with Professor Arthur Schuster at Manchester and Henri Moissan in Paris. From 1900-1908 he was a lecturer in electro-chemistry at the University of Manchester, where he carried out pioneering work on electric furnace technology, seeing its value for commercial metallurgy. In 1903 he perfected a method for the mass production of fused silica. Hutton had a great interest in research and development, and he was aware of failings in this area by British metallurgical industries. A great believer in the value of technical libraries, he was a founder of the Association of Scientific Libraries Information Bureau (ASLIB) in 1924. Held by University of Manchester Library https://archiveshub.jisc.ac.uk/data/gb133-hut.

Engraving of Anthias Anthias at that time called Anthias Sacer. The Author ran out of resources while issuing this book and therefore every engraving had its own sponsor. This one has been sponsored by Sigmund Zois Freiherr von Edelstein. Author: Bloch, Marcus Elieser, 1723-1799. Item/Page/Plate: Pl. 315, opp. p. 86. Image in the Public Domain(https://creativecommons.org/publicdomain/mark/1.0/deed.en; PD-US), courtesy of The New York Public Library, www.nypl.org.
Engraving of Anthias Anthias at that time called Anthias Sacer. The Author ran out of resources while issuing this book and therefore every engraving had its own sponsor. This one has been sponsored by Sigmund Zois Freiherr von Edelstein. Author: Bloch, Marcus Elieser, 1723-1799. Item/Page/Plate: Pl. 315, opp. p. 86. Image in the Public Domain (https://creativecommons.org/publicdomain/mark/1.0/deed.en; PD-US, courtesy of The New York Public Library).

Herring, Thomas (1693-1757). Papers of Thomas Herring, Archbishop of Canterbury 1747-57. 4 volumes, held by Lambeth Palace Library https://archiveshub.jisc.ac.uk/data/gb109-herring.

Papers of George Gordon Hake, 1891-1904. Hake was born in 1847. He spent thirteen years from 1891 working in South Africa, initially with the British South Africa Company and later with the Tanganyika Telegraph Service during 1889 and 1903 in the Mashonaland area. He died in 1903 and was buried at Port Herald. Hake was closely connected to the Rossetti family in their later years, acting as a ‘minder’ to Dante Gabriel Rossetti during one of their family holidays. Christina Rossetti was also godmother to his daughter Ursula. Held by School of Oriental and African Studies (SOAS) Archives, University of London https://archiveshub.jisc.ac.uk/data/gb102-ppms40.

Henry Guppy (1861-1948) was librarian of the John Rylands Library from 1900-1948. Held by University of Manchester Library https://archiveshub.jisc.ac.uk/data/gb133-tft/tft/1/459.

Declaration of Trust of Leasehold Property in Breams Buildings, Chancery Lane, London, 1888. Lease for the Breams Building, which was the main Birkbeck site from 1888-1952. The lease is in the form of a soft cover book, written over several velum pages, with wax seals on the last page. Held by Birkbeck Library Archives and Special Collections, University of London https://archiveshub.jisc.ac.uk/data/gb1832-bbk/bbk/6/1.

John Whiting Archive, 1917-1963. Whiting, a playwright and actor, was born in 1917 Salisbury, UK. He received his education at Taunton School and then later trained as an actor at Royal Academy of Dramatic Art. After his time in the army Whiting had some success as an actor and then went onto write numerous plays, short stories and plays for radio. Whiting also took up theatre criticism during the last few years of his life for ‘London Magazine’, some of his work can be found in the ‘The Art of Dramatist’ (1970). Held by V&A Theatre and Performance Collections https://archiveshub.jisc.ac.uk/data/gb71-thm/222.

Roe Manuscripts, 10th-17th century. Sir Thomas Roe was born in 1580 or 1581, and matriculated at Magdalen College, Oxford, in 1593, but took no degree. In 1605 he was knighted, and in 1614 began his official journeys to the East which made him famous. From that year to 1618 he was Ambassador to Jehngr, the Mogul emperor of Hindustan, and from 1621 to 1628 to the Turkish Court. In 1640 Roe was elected a burgess of the University in Parliament, and died in 1644. The manuscript collection comprises:  27 Greek, one Hebrew, one Arabic, and one Latin. Held by the Bodleian Library, University of Oxford https://archiveshub.jisc.ac.uk/data/gb161-mss.roe1-17,18a-b,19-29.

A "tornado" of schooling barracudas at Sanganeb Reef, Sudan. Copyright: Robin Hughes (https://commons.wikimedia.org/wiki/File:Barracuda_Tornado.jpg). Creative Commons 2.0 license: https://creativecommons.org/licenses/by-sa/2.0/deed.en.
A “tornado” of schooling barracudas at Sanganeb Reef, Sudan. Copyright: Robin Hughes (https://commons.wikimedia.org/wiki/File:Barracuda_Tornado.jpg). Creative Commons 2.0 license.

Rocket assisted take off by a Barracuda, 1945 – on HMS Trumpeter. 2 photos, held by Gwasanaeth Archifau Conwy / Conwy Archive Service https://archiveshub.jisc.ac.uk/data/gb2008-cp1727/cp1727/4/1/40.

Previous features relating to Fish

Silt, sluices and smelt fishing – The Eau Brink Cut and the Bedford Level Corporation Archive

Silt, sluices and smelt fishing – The Eau Brink Cut and the Bedford Level Corporation Archive

William Speirs Bruce Archive in the National Museums Scotland Library

William Speirs Bruce Archive in the National Museums Scotland Library

1. George Gershwin – Summertime lyrics: https://www.stlyrics.com/songs/g/georgegershwin8836/summertime299720.html

 

Names (4): Ethics and identity

As archivists, we deal with ethical issues a good deal.  But the ability to link disparate and diverse data sources opens up new challenges in this area, and I wanted to explore this a bit.

If you do a general search for ethics and data, top of the list comes health. An interesting example of data join-up is the move to link health data to census data, which could potentially highlight where health needs are not being met:

“Health services are required to demonstrate that they are meeting the needs of ethnic minority populations. This is difficult, because routine data on health rarely include reliable data on ethnicity. But data on ethnicity are included in census returns, and if health and census data for the same individuals can be linked, the problem might be solved.” (Ethnicity and the ethics of data linkage)

However, individuals who stated their ethnicity in census returns were not told that this might subsequently be linked with their health data. Should explicit informed consent be given? Given the potential benefits, is this a reasonable ask? It is certainly getting into hazardous terrain to ignore the principle of informed consent. In their book ‘Rethinking Informed Consent in Bioethics‘, Manson and O’Neill argue that informed consent cannot be fully specific or fully explicit. They argue for a distinctive approach where rights can be waived or set aside in controlled and specific ways.

This leads to a wider question, is fully explicit and specific informed consent actually achievable within the joined-up online world? A world where data travels across connections, is blended, re-mixed, re-purposed. A world where APIs allow data to be accessed and utilised for all sorts of purposes, and ‘open data’ has become a rallying cry.  Is there a need to engage the public more fully in order to gain public confidence in what open data really means, and in order to debate what ‘informed consent’ is, and where it is really required?

I am working on a project to create name records, and I am looking at bringing data sources together. Of course, this is hardly new. Wikipedia is the most well-known hub for biographical data. Anyone can add anything to a Wikipedia page (within some limits, and with some policing and editing by Wikipedia, but in essence it is an open database).  Wikidata, which underlies Wikipedia, is about bringing sources together in an automated way.  Projects within cultural heritage are also working on linked data approaches to create rich sources of information on people. SNAC has taken archival data from many different archive repositories and brought it together. A page for one person, such as Martin Luther-King provides a whole host of associations and links. These sources are not all individually checked and verified, because this kind of work has to be done algorithmically. However, there is a great deal of provenance information, so that all sources used are clear.

image of page from the face of white australia website
The Face of White Australia

There are some amazing projects working to reveal hidden histories. Tim Sherratt has done some brilliant work with Australian records. Projects such as Invisible Australians, which aims to reveal hidden lives, using biographical information found in the records. He has helped to create some wonderful sites that reveal histories that have been marginalised.  Tim talks about ‘hacking heritage’ and says: ‘By manipulating the contexts of cultural heritage collections we can start to see their limits and biases. By hacking heritage we can move beyond search interfaces and image galleries to develop an understanding of what’s missing.’ (Hacking heritage, blog post)  He emphasises that access to indigenous cultural collections should be subject to community consultation and control.  But what does community consultation and control really mean?

I have always been keen to work with the names in archival descriptions – archival creators and all the other people who are associated with a collection. They are listed in the catalogue (leastways the names that we can work with are listed – many names obviously aren’t included, but that’s another story), so they are already publicly declared. It is not a case of whether the name should be made public at all, or, at least, that decision has been made already by the cataloguer.   But our plan is to take the names and bring them to the fore – to give them their own existence within our service.  We are taking them out of the context of a single archive collection and putting them into a broader one. In so doing, we want to give the archive collections themselves more social context, we want to give more effective access to distributed historical records, and we also want to enable researchers to travel through connections to create their own narratives.

This may help to reveal things about our history and highlight the roles that people have played. It may bring people to the fore people who have been marginalised.  Of course, it does not address the problem of biases and subjective approaches to accessions and cataloguing. But a joined-up approach may help us to see those biases and gaps; to understand more about the silent spaces.

Creating persistent identifiers and linking data reveals knowledge. It is temping to see that in simple terms as a good thing.  But what about privacy and ethics?  Even if someone is no longer living, there are still privacy issues, and many people represented in archives are alive.

Do individuals want to be persistently identified? What about if they change their identity? Do they want a pseudonym associated with their real name? They might have very good reasons for keeping their identity private. Persistent identification encourages openness and transparency, which can have real benefits, but it is not always benign.  It is like any information – it can be used for good and bad purposes, and who is to say what is good and what is not? Obviously we have GDPR and the Data Protection Act, and these have a good deal to say about obligations, the value of historical research and the right to be forgotten. This is something we’ll need to take into account. But linked data principles are not so much about working with personal data as working with data that may not seem personal, but that can help to reveal things when linked with other sources of data.

GDPR supports the principle of transparency and the importance of people’s awareness and control over what happens to their personal data. Even if we are not creating and storing personal data, it seems important to engage with data protection and what this means. The challenge of how to think about data when it is part of an ever shifting and growing  global data environment seems to me to be a huge one.

Certainly the horse has bolted to some degree with regards to joining up data. The Web lowered barriers considerably, and now we increasingly have structured data, so it is somewhat like one gigantic database. Finding things out about individuals is entirely feasible with or without something like a Names service created by the Archives Hub. We are not creating any new content, but creating this interface means we are consciously bringing data together, and obviously we want to be responsible, and respect people’s right to privacy. Clearly it is entirely impractical to try to get permission from all those living people who might be included. So, in the end, we are taking a degree of risk with privacy.  Of course, we will un-publish on request, and engage with any feedback and concerns. But at present we are taking the view that the advantages and benefits outweigh the risks.

 

Image of exhibition photograph of black rights march

“Imagine being a sibling in a family that continually removes you from photos; tries its best to erase you…As you go through [the scrapbook] you see events where you know you were there, but you are still missing.”  Lae’l Hughes-Watkins (University of Maryland) gave an impassioned and inspiring talk at DCDC 2019 about her experiences.  She argued that archivists need to interrogate the reality that has been presented, and accept that our ideas of neutrality are misplaced. She wants a history that actively represents her – her history and culture, and experiences as a black woman in the USA. She related moving stories of people with amazing stories (and amazing archives) who distrust cultural institutions because they don’t feel included or represented.

This may seem a long way away from our small project to create name records, but in reality our project could be seen as one very small part of a move towards what Lae’l is talking about.  Bringing descriptions together from across the UK together maybe helps us to play a small role in this – aiming to move towards documenting the full breadth of human experience. The archives that we cover may retain the biases and gaps for some time to come (probably for ever, given that documentary evidence tends to represent the powerful and the elite much more strongly), but by aggregating and creating connections with other sources, we help to paint a bigger picture.  By creating name records we help to contextualise people, making it much easier to bring other lives and events into the picture. It is a move towards recognising the limitation of what is actually in the archive, and reaching out to take advantage of what is on the Web.  In doing this through explicitly identifying people we do leave ourselves more open to the dangers of not respecting privacy or anonymity. When we plug fully into the Web, we become a part of its infinite possibilities, which is always going to be a revealing, exciting, uncontrollable and risky business. By allowing others to use this data in different ways, we open it up to diverse perspectives and uses.

 

 

 

Here’s a riddle: how can you work in an Archive Centre when you can’t work in an Archive Centre?

Archives Hub feature for July 2020

It’s a dilemma in this strange and worrying time. The collections are there, you know this. You know they are safe. For the time being, for you to remain safe, for all of us to remain safe, you can’t go near them. But this is your job, and much more than that – a passion. We know that archives are stories, solidified memories of individuals, groups, institutions. Many have been around a lot longer than us, and will be there after we’re gone. But at this point of their long, interesting history, we are their gatekeepers, their tenders. Donors from all walks of life have entrusted us with their stories, letting go of the physical, holding only to the ephemeral, and yet now…now we too are distanced from the physical. So, again, how do we work in an Archive Centre when we can’t work in an Archive Centre?

Blythe Duff is a Scottish actress born in East Kilbride on 25 November 1962.  She has worked continuously since her debut as part of the Scottish Youth Festival in 1984. Though she has gone on to ply her trade mainly in theatre, she is perhaps best known for her role as Detective Sergeant Jackie Reid in the long-running Glasgow-based crime series Taggart. In 2011 she was awarded an honorary doctorate from Glasgow Caledonian University for services to the performing arts and in 2012 was made a cultural fellow of GCU.

It was in this guise that in 2018 she generously donated her decades-worth of accumulated Taggart artefacts to GCU Archive Centre. It is a rich, fascinating and rewarding resource for fans of the show both die-hard and casual, for aspiring scriptwriters, those with an interest in television production, and indeed for anyone with even a passing interest in Glasgow through the lens of British popular culture.

I’ve been thinking about this collection in these fast and slow days, weeks, and months of lockdown, as I adjust to this new, remote set-up. Once the working day is done, the laptop shut for the evening, I find myself, like so many, at a loose end. With so much temporarily closed, the question has become not so much what do I do, as what do I watch?

Blythe Duff and John Michie standing side by side between shelves of archive boxes and materials. Each is looking into camera and holding several scripts.
Blythe Duff and fellow Taggart star John Michie in GCU Archive Centre at the launch of her papers on 24th October 2018.

With this in mind the Blythe Duff Taggart papers are a fascinating insight into the televisual process of the late 20th century. As a scriptwriting graduate, I am particularly enthralled by the variety of artefacts on offer. There are 138 individual scripts contained in the collection, spanning from Blythe’s debut on the show in 1990 all the way to 2010. Researchers will find a mixture of rehearsal scripts and shooting scripts, a fantastic insight into the malleable nature of the production process. Particularly poignant is the two versions of 1994’s two-parter ‘Legends’. Mark McManus, the titular Taggart, tragically died before production had finished. The two versions, one featuring Detective Chief Inspector Jim Taggart, and the other re-written without, offer a glimpse into what could have been, as well as the embryonic steps of the show of which Taggart was to become.

It is the little details in the collection that draw me back to it – the scribbled notes on the pages, the inside jokes of the cast. Though the collection is currently uncatalogued, researchers will find Blythe’s personalised chair cover, a monogrammed Taggart jacket, along with a photo of Blythe in character in full police uniform. There are books as well; 25 Years of Taggart and Taggart’s Glasgow.  Other artefacts include Taggart wrap party flyers, postcards of different actors from the show – one signed by cast members. There’s even a Taggart Mystery Jigsaw Puzzle game!

Selection of photographs, artefacts, all from television show Taggart, artfully laid on black backdrop.
Selected Taggart treasures from the Blythe Duff papers.

Since becoming available to researchers, it is one of the collections at GCU Archive Centre that has proved most popular with a wide range of visitors. Almost as soon as it was publicised with a visit to the Archive Centre by Blythe and fellow cast member John Michie, we’ve had members of the public – some of whom had never been in an archive before – pop their head into the reading room and ask if they could read an episode. We’ve had a family of fanatics all the way from Australia, a couple from England where the husband surprised his super-fan wife for a special birthday, and many more besides.

It’s also a particularly relevant resource for the University’s learning and teaching as GCU has offered a Masters course in Television Fiction Writing since 2010, the first of its kind in the UK. One of the course leaders, Chris Dolan, was previously a writer for Taggart. Students of the course have examined the scripts, seeing how they’re structured, potentially being inspired in their own work.

Close up photo of cover page of script for episode of Taggart. ‘Blythe’ handwritten in top corner.
Cover page of one of Blythe’s scripts.

The frustration of not being able to go into the Archive Centre each day, not being able to see collections, or chat to team members with ease, is very real. Nonetheless, we have all adjusted to working from home. Team meetings still occur through the magic of MS Teams, projects are still ongoing, new challenges arise and are met. And in the thick of the unprecedented time we are in, if I think back to my initial question, I realise it is possible to work in an Archive Centre even if you can’t work there. For it is the collective knowledge we have, and our willingness to ensure collections are protected and as available to as many as possible that is the lifeblood of archival work. Archives are indeed stories, and at this juncture we’ve reached a twist worthy of Taggart himself. But the path we’re on, though long and difficult will lead us all back to where we want to be. It’s too tragic a time to call it a happy ending, but we’ve certainly had enough of cliff-hangers and will take a bittersweet conclusion.

David Ward
Archive Assistant
Glasgow Caledonian University Archive Centre – Sir Alex Ferguson Library

Related

Browse all Glasgow Caledonian University Archives and Special Collections descriptions available to date on the Archives Hub

All images copyright Glasgow Caledonian University. Reproduced with the kind permission of the copyright holders.

Names (3): One name record to bind them

It has been great to get comments and feedback around names, and I wanted to expand upon something that a few people have commented on….the ideal of one ‘authority record’ for one person or organisation.

model showing relationships of catalogues and name records
Model showing potential relationships between catalogues and name records

 

The above diagram is a proposal for the relationships we might have – note that is it a working model, and may well change over time. You can see the catalogues (the descriptions of archives) include people, some with biographical histories, and these people are either creators of archive collections or referenced in them.  Each of these people then gets a name record (bottom left box), so we might have e.g. three name records for  the same name (and the same name may potentially the same person…or may not). We will work with the store of records that we have with the aim of creating matches, and ending up with a generic or main name record (green box, top left).

The ‘main record’ or ‘master record’ or whatever we might call it, for each individual person or organisation, is not an ‘archival record’. It is not intended simply to be a reflection of what is in our own data. It is intended to be a page dedicated to that person or organisation.  Our current feeling is that this should not be seen as domain specific; in fact, we want to get away from the idea that data is domain specific.  It is about an entity (a person or organisation), and what we know of that entity.

Keeping in mind the green box, and looking at the person page for Robin Day from Exploring British Design, a previous AHRC project we ran with Brighton Design Archive, you get a sense of the type of thing we mean.

Page for Robin Day, from the Exploring British Design website
Exploring British Design: Robin Day

This page presents as a general information page about a designer. It is not branded as a page about archives. It takes information in from different sources. Is it an ‘authority’ record?  I’m really not sure; I wouldn’t call it that. The point is really that it enables researchers to put Robin Day into the context of other people, organisations places and events, or at least it demonstrates how that can be done. It creates a network, and it intends to show the value of including archives in a network, rather than standing apart, in their ‘own world’.

Screenshot of an entity relationship diagram for Robin Day
Visualised relationships

 

The network can easily be visualised. There are tools out there to do this. The challenge is to create the data to feed into these visualisers. Again, this visualisation is not about archival name authority records, it is not domain specific.

 

 

In the Robin Day page, we have a section for related archives and museum resources.

screenshot showing archives related to Robin Day
Related archive and museum resources

 

This lists archives Robin Day is the ‘creator of’ or archives he is ‘associated with’.  It links to the Archives Hub, but also to other sources. One of the options for end users is to go and find out more about the archival sources, but it is not prioritised above other options.

 

 

 

 

 

So, this is essentially the idea – a page for a person, a page for an organisation. An information resources that focuses on creating a network of connections.  We think this is a good approach, but creating something along these lines that is automated, sustainable and effective within an ongoing national service is much harder.

Why not just use this one record, link to the archive catalogues, and dispense with the individual name records that we have created? There are three reasons to consider providing access to the individual name records:  biographical history,  uncertainty around matching and ingesting name authority records.

I have already written about biographical and administrative history in a separate post.

In this phase of the Names Project the individual records for Beatrice Webb (as a name example), will be created either from the creator name or index terms that we have in the Archives Hub catalogues.

The main problem is the wide variation in name entries.

Webb, Beatrice
Webb, Beatrice, née Potter
Webb (Martha) Beatrice, 1858-1943
Webb, Martha Beatrice, 1858-1943
Webb;[Martha] Beatrice [nee Potter] 1858-1943

These are all entries in the Archives Hub.  We can match them all up, but can we say they are all the same?   Names without dates should not be matched with certainty, but quite often they will be the same person. (Beatrix Potter also often ends up being linked with Beatrice Webb, née Potter).

The decision we need to make is whether to provide links to these individual name records that we will have, or only use them as a source of data.  It seems valuable to enable end users to see these names as a group, but it is another thing to risk integrating information from them all into one name record.  There is no perfect answer to this, but it does seem important to clearly indicate the level of uncertainty.  So many names that we have don’t have life dates, or have variations in structure.  What we are looking to achieve is a clear provenance, giving end users the best understanding of what they are seeing.

What about name records that have been created by our contributors?  The name records we create ourselves from catalogue descriptions will generally be no more than the name, dates, and biographical history.  But, going forwards, we will want to work with much more detailed name records.

For Exploring British Design we created rich name records with an entity-relationship structure (essentially using the EAC-CPF structure and working in RDF),  to demonstrate the power of connecting entities.  For this purpose, we partially hand-crafted the name records, as well as carrying out some very complex processing to create various connections.

screenshot of part of the timeline for Robin Day
Part of the timeline for Robin Day

The example above shows events from the Robin Day timeline, with linked connections to related organisations.  If we ingest EAC-CPF records we might get timelines like this.

Name records may also include relationships. The Borthwick Institute has good examples of name records with plenty of rich relationship information. e.g. Charles Lindley Wood, Viscount Halifax.

screenshot of part of the Viscount Wood record showing relationships to other people
An excerpt from a Borthwick entry for Charles Lindley Wood

If we took this record into the Archives Hub it might seem to make sense for it to become the main person record for Wood.  But that would involve a process of making choices, preferencing one name record over another.  Possible, but tricky to do in an automated way. Another record office might also have a splendid example of a name entry for this person, with some different data. Furthermore, this record has links to the Borthwick catalogue. We would potentially have to remove these links.

It would be very challenging to create one record from several source EAC-CPF records for the same person –  to blend timelines, or sort out relationships listed in different records, bearing in mind that it needs to be done in an automated way, keeping version control and dealing with revisions and new data coming in that might add to the name record.  How could we compare and blend two lists of relationships? Or two chronologies? We’d probably end up having to keep them all, and then potentially have similar but different relationships and chronologies, giving a slightly confused user experience.

If we do ingest records like the one above, we will have to figure out how these  more detailed records will relate to what we have already created.  If, as planned, we have one generic name record for a person, it makes the job easier, as we won’t be looking to make any one EAC-CPF record into the main name record, we will simply link to it from the main record. Bear in mind, our main record is intended to be a domain-neutral entry – linking to other sources beyond archives.  EAC-CPF records might do this to some extent, but they are unlikely to link to the Jisc Library Hub, and probably won’t link to Wikidata, or other external sources.   They are far more likely to provide internal links to the archive catalogue they relate to.

Arguably, it might be easier to forget about creating name records ourselves (from the catalogue entries) and just work with name records that have been created by our contributors (which are likely to be well-structured and include life dates). But if we do that, the pot of names will grow slowly, as only a small proportion of repositories create name records. We can’t realistically give the end user a few thousand name records covering maybe 1-2% of our names – they might search for ‘Winston Churchill’ as a name, and find that we don’t have him!  It would not remove the problem of name matching, and it would make the whole idea of reaching out beyond the archive domain, by linking into other resources using our names as the hook, rather ineffectual.

Therefore, we propose to keep the separate name records in our system We propose to create a ‘generic record’, which is what would be prominent in the Archives Hub display. We would then have the potential to link the records together, to blend them,  to try some text mining and analysis techniques. It gives us options.  It would not be sensible to make those decisions now. It is better to lay the groundwork that enables us to be flexible.   This approach allows us to link to an individual name record where we don’t feel able to confirm a ‘same as’ relationship. It presents the option to the end user – here is a name – we think this is the same person, so we’ve provided a link.

The end user experience needs to make sense and not mislead or provide false information. Links to brief name records could seem confusing, but, as I have said, trying to bring together in one record all the information from several name records, with  their biographies, relationships, aliases, events, related resources, is likely to be a nightmare.  In the end, it will take a good deal more testing and working with researchers to work out what is best.

 

Archives Hub Names Project (2): Biographical History

It is a somewhat vexed question how to treat biographical and administrative history (in this post I’ll focus on biographical history).  This is an ISAD(G) field and an EAD field. ISAD defines it as providing “an administrative history of, or biographical details on, the creator (or creators) of the unit of description to place the material in context and make it better understood”.  It advises for personal names to include “full names and titles, dates of birth and death, place of birth, successive places of domicile, activities, occupation or offices, original and any other names, significant accomplishments, and place of death”.

On the Archives Hub we have a whole range of biographical histories – from very short to very comprehensive.  I have had conversations with archivists who believe that ‘putting the collection in context’ means giving information that is particularly relevant for that archive rather than giving a general history. Conversely, many biographical history entries do give a very full biography, even if the collection only relates to one aspect of a person’s life and work. They may also include information that is not readily available elsewhere, as it may have been discovered as part of the cataloguing process.

The question is, if we create a generic name record for a person, how do we treat this biographical information? There are a number of alternatives.

(1) Add all biographical history entries to the record

If you look at a SNAC example:  https://snaccooperative.org/view/54801840 you can see that this is the approach. It has merits – all of the biographical information is brought together. But it can mean a great deal of repetition, and the ordering of the entries can seem rather illogical, with short entries first and then longer comprehensive entries at the end.

Whilst most biographical history entries are pretty good, it also means a few not very helpful entries may be included, and may be top of the order. In addition, putting all the entries in together doesn’t always seem to make much sense. In the example below there are just three short entries for a major figure in women’s liberation. They are automatically brought in from the catalogue entry for individual collections. Sometimes the biographical entries in individual catalogues suffer from system migration and various data processing issues that mean you end up with field contents that are not ideal.

Millicent Garrett Fawcett biographical histories in SNAC

The question is whether this approach provides a useful and effective end user experience.

Where there is one entry for a creator, with one biographical history, there is no issue other than whether the entry makes sense as an overall biographical entry for that person or organisation. But we have to consider the common situation where there will be a dozen or more entries. Even if we start with one entry, others may be added over time.  Generally, there will be repetition and information gaps, but in many cases this approach will provide a good deal of relevant information.

(2) Keep the biographical history entries with the individual name records

At the moment our plan is to create individual name records for each person, as well as a generic master record.  We haven’t yet worked out the way this might be presented to the end user.  But we could keep the biographical histories with the individual entries we have for names. The generic record would link to these entries, and to the information they contain.  This makes sense, as it keeps the biographical histories separate, and within the entries they were written to accompany. Repetition is not an issue as it is clear why that might happen.  But the end user has to go to each entry in turn to read this information.

(3) Keep biographical history entries with individual name records, but enable the information to be viewed in the generic master record

We have been thinking about giving the end user the option to ‘click to see all biographical histories created for this person’. That would help with expectations. Simply presenting a page with a dozen similar biographical histories is likely to confuse people, but  enabling them to make a decision to view entries gives us more opportunity for explanation – the link could include a brief explanatory note.

(4) Select one biographical history to be in the generic record

We have discussed this idea, but it is really a non-starter. How do you select one entry? What would the criteria be if it is automated? The longest?

(5) Link to a generic biography if available

This is the idea of drawing in the wikipedia entry for that person or organisation, or potentially using another source.  There is a certain risk to pulling in data from an external source as the ‘definitive’ biographical information, but it the source would always be cited, and it does start to move towards the principle of bringing different sources of information together. If we want to create a more generic resource, we are going to have to take risks with using external sources.

 

I would be interested in any comments on this.

Names Project (1): Creation of name records

The Archives Hub Names Project

The Archives Hub team and Knowledge Integration, our system suppliers, are embarking upon a short four month project to start to lay the groundwork, define the challenges and test the approaches to presenting end users with a name-based means to search, and connect to a broad range of resources related to people and organisations.  I will be blogging about the project as we go along.

Our key aims in the long-term are:

  • To provide the end user with a way to search for people and organisations and find a range of material relevant to their research
  • To enable connections to be made between resources within and external to Jisc, using names as the main focus
  • To bring archive collections together in an intellectual sense and provide different contexts to collections by creating networks across our data

This first project will not create an end-user interface, but will concentrate on processing,  matching names and linking resources. We want to explore how this can be administered in order to be sustainable over time.  In the end, the most challenging part of working with the names we have is identification, disambiguation and matching.  The aim is to explore the space and start to formulate a longer-term plan for the full implementation of names as entities within the Archives Hub.

Creation of name records from EAD description records

NB: This blog often refers to personal names for convenience, but names include personal, family and corporate entities.

EAD includes namesEAD descriptions include personal, family and corporate names.  These ‘entities’ may be listed as archival creators and also associated with the collection as index terms. Archival creators may optionally be given biographical or administrative histories.  The relationship of the collection with names in the index is not made explicit in the description (in a structural way), though it may often be gleaned from the descriptive information within the EAD record.

Creating name records for all names

We are proposing to begin by creating name records for all of these entries, no matter how thin the information for each entry may be.

Here is a random selection of names that are included in Archives Hub records:

Grote, Arthur
Gaskell, Arthur
Wilson, John
Thatcher, J. Wells, Barrister at Law
Barron, Margaret
Stanley, Catherine, 1792-1862
Roe, Alfred Charles
Rowlatt, Mary, b 1908
Milligan, Spike, 1918-2002
Fawcett, Margaret, d. 1987
Rolfe, Alan, 1908-2002 actor
Mayers, Frederick J (fl 1896-1937 : designer : Kidderminster, England)
Joan

Only a percentage of names have life dates. Some have born or death dates, some floruit dates.

Of course, the life dates, occupations and outputs of many people are not known, or may be very difficult to find.  Also, life dates will change when a birth date is joined by a death date. Epithets may also change over time (and they are not controlled vocabulary anyway).

In addition, we have inverted and non-inverted names on the Archive Hub, names with punctuation in different places, names with and without brackets, etc.  These issues create identification challenges.

Even taking names as creators and names as index terms within one single description, the match is often not exact:

Millicent Garrett Fawcett (creator name)
Fawcett, Dame Millicent. (1847-1929) nee Garrett, Feminist and Suffragist (index term)

Lingard, Joan (creator name)
Lingard, Joan Amelia, 1932- (index term)

The archival descriptions on the Archives Hub vary a great deal in terms of the structure, and different repositories have different approaches to cataloguing.  Some do not add name of creator, some do not add index terms, some add them intermittently, and often the same name is added differently for different collections within the same repository.  In many cases the cataloguer does not add life dates, even when they are known, or they are added to the name as creator but not in the index list, or vice versa. This sounds like a criticism, but the reality is that there are many reasons why catalogues have ended up as they are.

There has not been a strong tradition amongst archivists of adding names as unique identifiable entities, but of course, it has only been in the last few decades that we have had the potential, which is becoming increasingly sophisticated, of linking data through entity relationships, and creating so much more than stand-alone catalogue records. Many archivists still think primarily in terms of human readable descriptions.  Some people feel that with the advent of Google and sophisticated text analysis, there is no need to add names in this structured way, and there is no need for index terms at all.  But in reality search engines generally recommend structured data, and they are using it in sophisticated ways.  Schema.org is for structured data on the web, an initiative started by Google, Microsoft, Yahoo, and Yandex. Explicit markup helps search engines understand content and it potentially helps with search engine optimisation (ensuring your content surfaces on search engines).  Also, if we want to move down the Linked Data road, even if we are not thinking in terms of creating strict RDF Linked Data, we need to identify entities and provide unique identifiers for them (URLs on the web). Going back to Tim Berners-Lee’s seminal Linked Data article from 2006:

“The Semantic Web isn’t just about putting data on the web. It is about making links, so that a person or machine can explore the web of data.  With linked data, when you have some of it, you can find other, related, data.”

So, including names explicitly provides huge potential (as well as subjects, places and other entities) and it has become more important, not less important. Indeed, I would go so far as to say that structured data is more important than standards compliant data, especially as, in my experience, standards are often not strictly adhered to, and also, they need constant updating in order to be relevant and useful.

The idea with our project is that we start with name records for every entity – a pot of data we can work with. We may create Encoded Archival Context (Corporate Bodies, Persons and Families), otherwise known as EAC-CPF…but that is not important at this stage.  EAC is important for data ingest and output, and we intend to use it for that purpose, so it will come into the picture at some point.

The power of the anonymous

There are benefits in creating name records for people who are essentially anonymous or not easily identifiable.  Firstly, these records have unknown potential; they may become key to making a particular connection at some point, bearing in mind that the Archives Hub continually takes new records in. Secondly, we can use these records to help with identification, and the matching work that we undertake may help to put more flesh on the bones of a basic name record.  If we have ‘Grote, Arthur’ and then we come across ‘Grote, Arthur, 1840-1912’, we can potentially use this information and create a match. Of course, the whole business of inference is a tricky thing – you need more than a matching surname and forename to create a ‘same as’ relationship (I won’t get into that now). But the point is that a seemingly ‘orphan’ name may turn out to have utility. It may, indeed, provide the key to unlocking our understanding of particular events – the relationships and connections between people and other entities are what enable us to understand more about our history.

Components of a name record

So, all names will have name records, some with just a name, some with life dates of different sorts, some with biographical or administrative histories. The exception to this may be names that are not identifiable as people or organisations.  It is potentially possible to discover the type of entity from the context, but that is a whole separate piece of work.  Hundreds of names on the Archives Hub are simply labelled as ‘creator’ or ‘name’. This is down to historical circumstance – partly the Archives Hub made errors in the past (our old cataloguing tool which entered creators as simply EAD ‘origination’), partly other systems we ingest data from.  At the moment, for example, we are taking in descriptions from Axiell’s AdLib system, but the system does not mark up creator names as people or organisations (unless the cataloguer explicitly adds this), so we cannot get that information. This is probably a reflection of a time when semantically structured data was simply less important. If a human reads ‘Elizabeth Gaskell’ in a catalogue entry they are likely to understand what that string means; if undertaking large-scale automated processing, it is just a string of characters, unless it includes semantic information.

From the name records that we create, we intend to develop and run algorithms to match names. In many cases, we should be able to draw several names together, with a ‘same-as’ relationship. Some may be more doubtful, others more certain. I will talk about that as we get into the work.

At the moment, we have some ideas about how we will work with these individual records in terms of the workflow and the end user experience, but we have not made any final decisions, and we think that what is most important at this stage is the creation and experimentation with algorithms to see what we can get.

Master name records

We intend to create master records for people and organisations. The principle is to see these master records not as something within the archives domain, but as stand-alone records about a person or organisation that enable a range of resources to be drawn together.

So, we might have several name records for one person:

Example of master record, with various related information included:
Webb, Martha Beatrice, 1858-1943, social reformer and historian

Examples of additional name records that should link to the master record:
Webb, Beatrice, 1858-1943 (good match)
Webb, Martha Beatrice, 1858-1943, economist and reformer (good match)
Webb, Martha Beatrice, nee Potter, 1858-1943 (good match)
Webb, M.B. b. 1858 (possible match)
but…
Potter, Martha Beatrice, b 1858
…might well not be a match, in which case it would stand separately, and the archive connected to it would not benefit from the links being made.

We have discussed the pros and cons of creating master records for all names.  It makes sense to bring together all of the Beatrice Webb names into one master record – there is plenty that can be said about that individual; but does it make sense to have a master record for single orphaned names with no life dates and nothing (as yet) more to say about that individual?  That is a question we have yet to answer.

diagram showing link between archive, name records and master records
The archive is described though an EAD description held on our system (the CIIM). We take all the names from this to create a huge store of individual names. From this, we aim to create and update ‘definitive’ name records.

The principle is to have name records that enables us to create links to the Archives Hub entries and also to other Jisc services and resources beyond that – resources outside of the archives domain.  Many of these resources may also help us with our own identification and matching processes. It is important to benefit from the work that has already been done in this area.

We are looking at various name resources and assessing where our priorities will be.  This is a fairly short project, and we won’t have time to look at more than a handful of options. But we are currently thinking in terms of VIAF, ORCID and Wikidata. More on that to follow.

Personally, I’ve been thinking about working with names for several years. We have been asked about it quite a bit. But the challenge is so big and nebulous in many ways. It has not been feasible to embark upon this kind of work in the past, as our system has not supported the kind of systematic processing that is required. We are also able to benefit from the expertise K-Int can bring to data processing. It is one thing doing this as a stand-alone project; it is quite another to think about a live service, long term sustainability, version control and revisions, ingest from different systems, etc.  And also, to break it down into logical phases of work.  It is exciting, but it is going to involve a great deal of hard work and hard thinking.

 

 

 

 

 

 

Interconnected archives: cataloguing the Rossetti family letters at Leeds University Special Collections

Archives Hub feature for June 2020

Special Collections holds over 700 letters written by members of the Rossetti family. The collection includes letters from nearly all members of this storied family, with the bulk written by Dante Gabriel (https://en.wikipedia.org/wiki/Dante_Gabriel_Rossetti) and William Michael (https://en.wikipedia.org/wiki/William_Michael_Rossetti), and a significant tranche from Christina Rossetti (https://en.wikipedia.org/wiki/Christina_Rossetti).  The letters are only a fraction of the full Rossetti family correspondence, which can be found in libraries and archives across the world.

The Rossetti Family by Lewis Carroll, albumen print, 7 October 1863 (Christina Georgina Rossetti, Dante Gabriel Rossetti, Frances Mary Lavinia Rossetti (née Polidori) and William Michael Rossetti). NPG P56. © National Portrait Gallery, London. Creative Commons 3.0 licence (https://creativecommons.org/licenses/by-nc-nd/3.0/).

Many of the letters have been in Special Collections since the 1930s but were not catalogued in any detail. Some were represented by very brief index records, which did not convey the scope or context of the full collection, others were entirely uncatalogued. Although much of the Dante Gabriel and Christina Rossetti correspondence had been published in their respective Collected Letters ((The Correspondence of Dante Gabriel Rossetti, ed. William E. Fredeman, 2015 and The Letters of Christina Rossetti, https://rotunda.upress.virginia.edu/crossetti/), but the letters themselves remained inaccessible for research.

A 2019 project funded by the Strachey Trust enabled us to repackage and create item-level records for each letter in the collection. Catalogue records included basic ISAD(G) metadata, a brief synopsis of the letter’s contents, links to authority files for both sender and addressee and a reference for the published version of the letter, where one exists. The finished catalogue now describes the full extent of the Rossetti Collection at Leeds, ensuring that material is identifiable, accessible for research and secure in our holdings.

Dante Gabriel Rossetti.

Cataloguing gave us fascinating insight into the lives of the Rossettis. The largest group of letters in the collection were written by Dante Gabriel Rossetti and cover both the beginning and end of his career. Early letters reveal a humorous correspondent. One, written from a deluged Kent, describes him sketching ‘with my umbrella tied over my head to my buttonhole – a position which you will oblige me by remembering, I expressly desired should be selected for my statue. (N.B. Trousers turned up.)’

These are in direct contrast to later letters to Theodore Watts-Dunton (https://en.wikipedia.org/wiki/Theodore_Watts-Dunton) who acted as Rossetti’s advisor. The volume and regularity of Rossetti’s letters to Watts-Dunton, their paranoia and requests for advice show Rossetti’s great dependence on his close friends in later years.

The collection includes 30 letters written by Christina Rossetti. Project work uncovered a previously unknown letter, written to her sister-in-law, Lucy Maddox Brown Rossetti (https://en.wikipedia.org/wiki/Lucy_Madox_Brown). This brief letter gives Rossetti’s assessment of an unnamed poem: ‘The fact is I think it diabolical. Its degree of serene skill and finesse intensifies to me its horror…’

William Michael Rossetti

150 letters by William Michael Rossetti were also catalogued during this project, the majority of which are unpublished. His letters include a long series addressed to John Lucas Tupper (https://sculpture.gla.ac.uk/view/person.php?id=msib7_1220373335), a close associate and contributor to ‘The Germ’, the journal of the Pre-Raphaelite Brotherhood. The letters to Tupper, whose writing and career he promoted, highlight professional opportunities and networks of editors and journals available during this period. They give an interesting glimpse of the kind of life afforded to a literary Victorian gentleman employed by the Civil Service. During certain periods of his life, Rossetti travelled abroad, visiting the continent and even Australia. Having been robbed on one occasion in Italy, he discusses the advisability of carrying a pistol with Tupper, who travelled with him in 1869. Other letters cover wide-ranging topics, from discussions of Ruskin and Browning to the politics of the day, spiritualism, and lycanthropy.

Alongside revealing individual letters, the catalogue records now allow researchers to explore Rossetti family networks in some detail.  A good example of this is correspondence relating to the artist Frederic Shields (https://en.wikipedia.org/wiki/Frederic_Shields), who was a regular subject of Dante Gabriel Rossetti’s letters to Watts-Dunton. Later letters from William Michael Rossetti to Shields describe the hours before his brother’s death with great tenderness, passing on a last message to Shields. Subsequent letters from Christina Rossetti are concerned with Shields’ work on a memorial for Dante Gabriel Rossetti. These intertwined relationships would not be easily discoverable from published letters alone but can be usefully explored through this catalogue.

Cataloguing also gave us the chance to research the provenance of groups of letters in the collection. This revealed connections between material previously considered separate: the Swinburne manuscript collection (https://explore.library.leeds.ac.uk/special-collections-explore/8607) and substantial correspondence relating to Swinburne and Watts-Dunton (including Rossetti correspondence) were all acquired from the same source, Watts-Dunton’s estate. These letters and manuscripts had historically been treated as distinct collections, and the connections between them were not clear from catalogue records.

Image taken from one of the Rossetti family letters.

Cataloguing work on this small collection has emphasised the many levels of interconnectedness in which archives exist. Letters can show relationships between individuals, collections of letters show their wider networks, and collections themselves speak to other material both within a repository and in many other locations across the world.

The Rossetti family letters collection is now available for research (https://explore.library.leeds.ac.uk/special-collections-explore/7436).  This project would not have been possible without the support of the Strachey Trust, and Special Collections is grateful to it for its generosity in funding work on this significant collection.

Sarah Prescott
Literary Archivist
University of Leeds Special Collections

Related

Rossetti Family correspondence, 1843-1909

Browse all University of Leeds Special Collections descriptions on the Archives Hub

Explore more collections relating to the Rossetti family on the Archives Hub

Previous features on University of Leeds Special Collections:

“Gather them in” – the musical treasures of W.T. Freemantle

Sentimental Journey: a focus on travel in the archives

Recipes through the ages 

World War One

All images copyright University of Leeds Special Collections and National Portrait Gallery, London. Reproduced with the kind permission of the copyright holders.

 

Online Resources: Explore archives in different ways

Archives Hub feature for May 2020

The Archives Hub includes descriptions called Online Resources.  These sit alongside Archive Collection descriptions and Repository descriptions.

screenshot hit list showing online resources on health
Online Resources on health

Online Resources are collections of resources, typically digitised content. They are often created as part of a project, and usually based on a specific theme. But the definition is purposely very loose. They are essentially any web sites that offer any kind of introduction, interpretation, or way into archives, other than the more traditional archival descriptions for individual collections.

All Online Resources point to a website, but that doesn’t mean that they only represent digital materials. The website may provide narrative and context for physical collections.  A good example of this is War Child. The site is a story about the Evacuee Archive – how it came into being, the man who created it, what he has experienced.

screenshot of War Child site homepage
War Child

It aims to explore and document the life of this archive. The archive is largely paper-based, and includes some recordings and artefacts.  War Child provides a wonderful, creative experience, thinking about how people engage with archives and how individuals are shaped by archives.

Many Online Resources do represent digital collections, and frequently they showcase collaborations. Windows on Genius is a project by the University of Cambridge and University of Sussex that spans two digital collections, giving access to the works of Sir Isaac Newton.  Other Online Resources are materials within one institution, brought together by topic, such as Selected Sources on Healthcare, at the University of Warwick. This is a selection of primary sources relating to British healthcare before the foundation of the National Health Service.

screenshot of map showing endangered languages
Map showing endangered languages

Some Resources are ‘artificial collections’ that have been brought together to aid researchers, such as Endangered Languages – a digital repository created by SOAS, specialising in preserving and publishing endangered language documentation materials from around the world.

 

 

Quite often Online Resources provide help with interpretation and using sources for teaching, such as the Pre-Raphaelite resource, which provides teaching materials and allows for personal collections to be created.

screenshot showing link to teaching resources on the Pre-Raphaelite website
Pre-Raphaelite Illustrations learning resource

Something like this is a wonderful introduction to a subject for a new researcher.

Some of the resources are simply digital collections. Potentially they could also be described simply as Archive Collections.

screenshot of BT digital archive website
The BT Digital Archive

For example the BT Digital Archives Online Resource is an archive collection, and indeed, we do have this collection listed as The BT Digital Archives collection. However, the Online Resource takes the user to the full catalogue, and it provides further context and showcases highlights from the collection.

Our rationale for having Online Resources is more about servicing the end user than the strict definition of what an archive collection is and whether it can be described as an online resource. We want to make sure people find the materials, and we also want to promote any added value that they can get through narrative, context and interpretation that the holding institution provides.

We aim to increase the descriptions of Online Resources – we create them ourselves when we find good resources, and our current contributors can also create them quickly and easily. If an Online Resource is offered by a non-contributor, we can create it for them, or provide a specific type of access to our cataloguing tool, to allow them to create the entry.  It provides another discovery channel, so for the short amount of time it takes to write a short entry, it may be found by a researcher who would otherwise never have known about it.

These digital collections and physical archives and websites for learning, teaching, and research include a wealth of materials from many institutions across the UK. From fashion to photography, dance to Darwin, soldiers to Shakespeare, these websites represent a whole range of archival resources, often with strong visual themes that can be used for research, learning and teaching.  Explore the Online Resources, and do get in touch if you have any suggestions for additions to our catalogue!

 

 

Planes, pilots and politics: National Aerospace Library’s collections fly onto Archives Hub

Archives Hub feature for April 2020

The human race has always wanted to fly, and the National Aerospace Library’s collection shows how we have pursued those dreams to conquer and then perfect flight; from aeroplanes to hovercraft, air travel to satellites, and missiles to man carrying kites. Our earliest book, from 1515, looks at how objects travel through the air and we are still collecting material on cutting edge aero engineering.

The NAL is unusual for an institute collection. Rather than specialising in a single profession, the library follows its parent organisation, the Royal Aeronautical Society, by covering all the sciences and arts connected to travel above the ground. From designing aircraft to insurance and law, from flying eighteenth-century balloons to airport operations and from aero medicine to aerial warfare.

Flying Countess before a flight in 1918.
Flying Countess before a flight in 1918.

Social historians can find a wealth of information within our four walls. For example, we have three interesting collections from women who were captivated by flight during the interwar period, with the collections of The Flying Countess, Cathleen Countess or Drogheda, and two pioneering women who tried to fly across Africa, Delphine Reynolds , who reached as far as Sierra Leone in early 1931, and Peggy Salaman who reached Cape Town later that year. The collection of Wilfred Parke gives an insight into the pre-World War I world of air racing.

Flying has always captured the imagination and has been recorded in prints, posters, photographs and paintings. We care for over 100,000 Images showing early balloon lithographs from the eighteenth century, the stylish design that accompanied air travel in the 1930s, glass slides explaining scientific concepts, plus tens of thousands of images showing aeroplanes. Many of these images are available via the Mary Evan Picture Library’s corporate licencing and merchandise sites.

 

Lithograph of George Biggin, Letitia Sage and Vincenzo Lunardi ascending from St George's Fields, London, 29 June 1785.
Lithograph of George Biggin, Letitia Sage and Vincenzo Lunardi ascending from St George’s Fields, London, 29 June 1785.

Aeronautics is also a business and our collections cover how the world of science, government, warfare and business collide. This is best shown through the records of Britain’s aviation trade organisation – the Society of British Aircraft Constructors , also known as the SBAC. Starting during the First World War, these minute books chronicle seventy years of thinking of those high up in industry. We also have the wartime records of the British and Colonial Aeroplane Company with its digitised minute book appearing on our Heritage website and the Broke-Smith Archive contains some interesting material on military aviation before the First World War.

Instructions on re-assembling the Wright Flyer by Orville Wright, 1928.
Instructions on re-assembling the Wright Flyer by Orville Wright, 1928.

The Royal Aeronautical Society was created decades before the Wright Brothers became the first men to fly a powered aircraft, and archive of the Royal Aeronautical Society is strong on how the great minds of the time worked out how to design the machines that enabled us to fly. One of our main treasures are the scientific papers of Sir George Cayley, the man dubbed the father of aeronautics, who established many of the principles flight, such as establishing that gaining lift should be separated from the propulsion system, as well as discoveries well away from aeronautics, such as designing prosthetics and geared bicycles. Other early collections include the Baden-Powell ballooning cuttings collection, Percy Pilcher’s work on gliders and Lawrence Hargrave’s photograph albums. We have digitised the Cayley Notebooks, Pilcher Drawings and Hargrave albums and they can all be viewed on our heritage website.

Sir George Caley's notebook on www.AeroSocietyHeritage.com
Sir George Caley’s notebook on www.AeroSocietyHeritage.com

We also have an extensive letters collection, which includes correspondence from the Society and its leading members. The collections are especially strong in the early days of flight, with letters from the pioneers of flight, such as the Wright Brothers, Samuel Cody, Samuel Langley, Octave Chanute, Lawrence Hargrave, J.W. Dunne, A.V Roe, Lord Rayleigh, Sir Frederick Handley Page, Alberto Santos-Dumont, Gustav Lilienthal, F.W. Lanchester, James Glaisher and Sir Geoffrey de Havilland. Though we have not yet listed each letter on Archives Hub, a list of files can be found on the online and we can then use our paper indexes to find out more about each item of correspondence. Interaction with the great names in aeronautics politics and the services between 1910 and 1953 can be found in the correspondence files of the acid-tonged editor of Aeroplane magazine, C. G. Grey.

From a publicity brochure c. 1911.
From a publicity brochure c. 1911.

Our aero engineering archive collections move from the pioneering days into the aircraft designers and producers. The British & Colonial Aeroplane Company Collection includes design work for many post-war Bristol Aircraft, Second World War propeller developments can be found in the collection of de Havilland’s A. V. Cleaver, W. O. Manning’s work at English Electric and aeronautical papers of George William Saynor show design work at Blackburn Aircraft and Canadian Vickers, together with the designs of he and his partner, which came together in the Saynor & Bell Canadian Cub & Canadian Cub II.

Last but not least, the NAL holds the records of our parent organisation, the Royal Aeronautical Society. As well as membership records of the great and the good of the industry and day-by-day administration of a learned society, it also contains audio recordings of over four hundred of its lectures and conferences, primarily from the 1960s and 1990s onwards. The NAL has digitised most of the collection and has been slowly podcasting some of the gems over the last two or three years, including from the great names in British aero industry, such as Sir Frederick Handley Page describing the launch of Britain’s first big aircraft, Sir Geoffrey de Havilland talking about the his first few years in aeronautics, military topics such as the history of the nuclear delivery aircraft, the V-bombers, and scientific lectures such as the first 50 years of aeroelasticity.

Handley Page podcast.
Handley Page podcast.

So far, the National Aerospace Library has placed high level descriptions of just over thirty of our main collections on Archives Hub. We will be now working to fill in some of the lower level information and details that is currently stored in paper index files plus or hidden away on our library catalogue,  plus add details of some of our other collections to the site.

Zepplin poster order.
Zepplin poster order.

In the meantime, we always welcome enquiries, either by phone 01252 701038/60 or email. Further to the UK Government’s guidance, the National Aerospace Library is currently closed to external visitors to ensure the health and wellbeing of staff, members, and volunteers but online services remain available.

Tony Pilmer, Librarian
National Aerospace Library

Related

Browse all National Aerospace Library collection descriptions available to date on the Archives Hub

All images copyright National Aerospace Library. Reproduced with the kind permission of the copyright holders.