Names (4): Ethics and identity

As archivists, we deal with ethical issues a good deal.  But the ability to link disparate and diverse data sources opens up new challenges in this area, and I wanted to explore this a bit.

If you do a general search for ethics and data, top of the list comes health. An interesting example of data join-up is the move to link health data to census data, which could potentially highlight where health needs are not being met:

“Health services are required to demonstrate that they are meeting the needs of ethnic minority populations. This is difficult, because routine data on health rarely include reliable data on ethnicity. But data on ethnicity are included in census returns, and if health and census data for the same individuals can be linked, the problem might be solved.” (Ethnicity and the ethics of data linkage)

However, individuals who stated their ethnicity in census returns were not told that this might subsequently be linked with their health data. Should explicit informed consent be given? Given the potential benefits, is this a reasonable ask? It is certainly getting into hazardous terrain to ignore the principle of informed consent. In their book ‘Rethinking Informed Consent in Bioethics‘, Manson and O’Neill argue that informed consent cannot be fully specific or fully explicit. They argue for a distinctive approach where rights can be waived or set aside in controlled and specific ways.

This leads to a wider question, is fully explicit and specific informed consent actually achievable within the joined-up online world? A world where data travels across connections, is blended, re-mixed, re-purposed. A world where APIs allow data to be accessed and utilised for all sorts of purposes, and ‘open data’ has become a rallying cry.  Is there a need to engage the public more fully in order to gain public confidence in what open data really means, and in order to debate what ‘informed consent’ is, and where it is really required?

I am working on a project to create name records, and I am looking at bringing data sources together. Of course, this is hardly new. Wikipedia is the most well-known hub for biographical data. Anyone can add anything to a Wikipedia page (within some limits, and with some policing and editing by Wikipedia, but in essence it is an open database).  Wikidata, which underlies Wikipedia, is about bringing sources together in an automated way.  Projects within cultural heritage are also working on linked data approaches to create rich sources of information on people. SNAC has taken archival data from many different archive repositories and brought it together. A page for one person, such as Martin Luther-King provides a whole host of associations and links. These sources are not all individually checked and verified, because this kind of work has to be done algorithmically. However, there is a great deal of provenance information, so that all sources used are clear.

image of page from the face of white australia website
The Face of White Australia

There are some amazing projects working to reveal hidden histories. Tim Sherratt has done some brilliant work with Australian records. Projects such as Invisible Australians, which aims to reveal hidden lives, using biographical information found in the records. He has helped to create some wonderful sites that reveal histories that have been marginalised.  Tim talks about ‘hacking heritage’ and says: ‘By manipulating the contexts of cultural heritage collections we can start to see their limits and biases. By hacking heritage we can move beyond search interfaces and image galleries to develop an understanding of what’s missing.’ (Hacking heritage, blog post)  He emphasises that access to indigenous cultural collections should be subject to community consultation and control.  But what does community consultation and control really mean?

I have always been keen to work with the names in archival descriptions – archival creators and all the other people who are associated with a collection. They are listed in the catalogue (leastways the names that we can work with are listed – many names obviously aren’t included, but that’s another story), so they are already publicly declared. It is not a case of whether the name should be made public at all, or, at least, that decision has been made already by the cataloguer.   But our plan is to take the names and bring them to the fore – to give them their own existence within our service.  We are taking them out of the context of a single archive collection and putting them into a broader one. In so doing, we want to give the archive collections themselves more social context, we want to give more effective access to distributed historical records, and we also want to enable researchers to travel through connections to create their own narratives.

This may help to reveal things about our history and highlight the roles that people have played. It may bring people to the fore people who have been marginalised.  Of course, it does not address the problem of biases and subjective approaches to accessions and cataloguing. But a joined-up approach may help us to see those biases and gaps; to understand more about the silent spaces.

Creating persistent identifiers and linking data reveals knowledge. It is temping to see that in simple terms as a good thing.  But what about privacy and ethics?  Even if someone is no longer living, there are still privacy issues, and many people represented in archives are alive.

Do individuals want to be persistently identified? What about if they change their identity? Do they want a pseudonym associated with their real name? They might have very good reasons for keeping their identity private. Persistent identification encourages openness and transparency, which can have real benefits, but it is not always benign.  It is like any information – it can be used for good and bad purposes, and who is to say what is good and what is not? Obviously we have GDPR and the Data Protection Act, and these have a good deal to say about obligations, the value of historical research and the right to be forgotten. This is something we’ll need to take into account. But linked data principles are not so much about working with personal data as working with data that may not seem personal, but that can help to reveal things when linked with other sources of data.

GDPR supports the principle of transparency and the importance of people’s awareness and control over what happens to their personal data. Even if we are not creating and storing personal data, it seems important to engage with data protection and what this means. The challenge of how to think about data when it is part of an ever shifting and growing  global data environment seems to me to be a huge one.

Certainly the horse has bolted to some degree with regards to joining up data. The Web lowered barriers considerably, and now we increasingly have structured data, so it is somewhat like one gigantic database. Finding things out about individuals is entirely feasible with or without something like a Names service created by the Archives Hub. We are not creating any new content, but creating this interface means we are consciously bringing data together, and obviously we want to be responsible, and respect people’s right to privacy. Clearly it is entirely impractical to try to get permission from all those living people who might be included. So, in the end, we are taking a degree of risk with privacy.  Of course, we will un-publish on request, and engage with any feedback and concerns. But at present we are taking the view that the advantages and benefits outweigh the risks.

 

Image of exhibition photograph of black rights march

“Imagine being a sibling in a family that continually removes you from photos; tries its best to erase you…As you go through [the scrapbook] you see events where you know you were there, but you are still missing.”  Lae’l Hughes-Watkins (University of Maryland) gave an impassioned and inspiring talk at DCDC 2019 about her experiences.  She argued that archivists need to interrogate the reality that has been presented, and accept that our ideas of neutrality are misplaced. She wants a history that actively represents her – her history and culture, and experiences as a black woman in the USA. She related moving stories of people with amazing stories (and amazing archives) who distrust cultural institutions because they don’t feel included or represented.

This may seem a long way away from our small project to create name records, but in reality our project could be seen as one very small part of a move towards what Lae’l is talking about.  Bringing descriptions together from across the UK together maybe helps us to play a small role in this – aiming to move towards documenting the full breadth of human experience. The archives that we cover may retain the biases and gaps for some time to come (probably for ever, given that documentary evidence tends to represent the powerful and the elite much more strongly), but by aggregating and creating connections with other sources, we help to paint a bigger picture.  By creating name records we help to contextualise people, making it much easier to bring other lives and events into the picture. It is a move towards recognising the limitation of what is actually in the archive, and reaching out to take advantage of what is on the Web.  In doing this through explicitly identifying people we do leave ourselves more open to the dangers of not respecting privacy or anonymity. When we plug fully into the Web, we become a part of its infinite possibilities, which is always going to be a revealing, exciting, uncontrollable and risky business. By allowing others to use this data in different ways, we open it up to diverse perspectives and uses.

 

 

 

Names (3): One name record to bind them

It has been great to get comments and feedback around names, and I wanted to expand upon something that a few people have commented on….the ideal of one ‘authority record’ for one person or organisation.

model showing relationships of catalogues and name records
Model showing potential relationships between catalogues and name records

 

The above diagram is a proposal for the relationships we might have – note that is it a working model, and may well change over time. You can see the catalogues (the descriptions of archives) include people, some with biographical histories, and these people are either creators of archive collections or referenced in them.  Each of these people then gets a name record (bottom left box), so we might have e.g. three name records for  the same name (and the same name may potentially the same person…or may not). We will work with the store of records that we have with the aim of creating matches, and ending up with a generic or main name record (green box, top left).

The ‘main record’ or ‘master record’ or whatever we might call it, for each individual person or organisation, is not an ‘archival record’. It is not intended simply to be a reflection of what is in our own data. It is intended to be a page dedicated to that person or organisation.  Our current feeling is that this should not be seen as domain specific; in fact, we want to get away from the idea that data is domain specific.  It is about an entity (a person or organisation), and what we know of that entity.

Keeping in mind the green box, and looking at the person page for Robin Day from Exploring British Design, a previous AHRC project we ran with Brighton Design Archive, you get a sense of the type of thing we mean.

Page for Robin Day, from the Exploring British Design website
Exploring British Design: Robin Day

This page presents as a general information page about a designer. It is not branded as a page about archives. It takes information in from different sources. Is it an ‘authority’ record?  I’m really not sure; I wouldn’t call it that. The point is really that it enables researchers to put Robin Day into the context of other people, organisations places and events, or at least it demonstrates how that can be done. It creates a network, and it intends to show the value of including archives in a network, rather than standing apart, in their ‘own world’.

Screenshot of an entity relationship diagram for Robin Day
Visualised relationships

 

The network can easily be visualised. There are tools out there to do this. The challenge is to create the data to feed into these visualisers. Again, this visualisation is not about archival name authority records, it is not domain specific.

 

 

In the Robin Day page, we have a section for related archives and museum resources.

screenshot showing archives related to Robin Day
Related archive and museum resources

 

This lists archives Robin Day is the ‘creator of’ or archives he is ‘associated with’.  It links to the Archives Hub, but also to other sources. One of the options for end users is to go and find out more about the archival sources, but it is not prioritised above other options.

 

 

 

 

 

So, this is essentially the idea – a page for a person, a page for an organisation. An information resources that focuses on creating a network of connections.  We think this is a good approach, but creating something along these lines that is automated, sustainable and effective within an ongoing national service is much harder.

Why not just use this one record, link to the archive catalogues, and dispense with the individual name records that we have created? There are three reasons to consider providing access to the individual name records:  biographical history,  uncertainty around matching and ingesting name authority records.

I have already written about biographical and administrative history in a separate post.

In this phase of the Names Project the individual records for Beatrice Webb (as a name example), will be created either from the creator name or index terms that we have in the Archives Hub catalogues.

The main problem is the wide variation in name entries.

Webb, Beatrice
Webb, Beatrice, née Potter
Webb (Martha) Beatrice, 1858-1943
Webb, Martha Beatrice, 1858-1943
Webb;[Martha] Beatrice [nee Potter] 1858-1943

These are all entries in the Archives Hub.  We can match them all up, but can we say they are all the same?   Names without dates should not be matched with certainty, but quite often they will be the same person. (Beatrix Potter also often ends up being linked with Beatrice Webb, née Potter).

The decision we need to make is whether to provide links to these individual name records that we will have, or only use them as a source of data.  It seems valuable to enable end users to see these names as a group, but it is another thing to risk integrating information from them all into one name record.  There is no perfect answer to this, but it does seem important to clearly indicate the level of uncertainty.  So many names that we have don’t have life dates, or have variations in structure.  What we are looking to achieve is a clear provenance, giving end users the best understanding of what they are seeing.

What about name records that have been created by our contributors?  The name records we create ourselves from catalogue descriptions will generally be no more than the name, dates, and biographical history.  But, going forwards, we will want to work with much more detailed name records.

For Exploring British Design we created rich name records with an entity-relationship structure (essentially using the EAC-CPF structure and working in RDF),  to demonstrate the power of connecting entities.  For this purpose, we partially hand-crafted the name records, as well as carrying out some very complex processing to create various connections.

screenshot of part of the timeline for Robin Day
Part of the timeline for Robin Day

The example above shows events from the Robin Day timeline, with linked connections to related organisations.  If we ingest EAC-CPF records we might get timelines like this.

Name records may also include relationships. The Borthwick Institute has good examples of name records with plenty of rich relationship information. e.g. Charles Lindley Wood, Viscount Halifax.

screenshot of part of the Viscount Wood record showing relationships to other people
An excerpt from a Borthwick entry for Charles Lindley Wood

If we took this record into the Archives Hub it might seem to make sense for it to become the main person record for Wood.  But that would involve a process of making choices, preferencing one name record over another.  Possible, but tricky to do in an automated way. Another record office might also have a splendid example of a name entry for this person, with some different data. Furthermore, this record has links to the Borthwick catalogue. We would potentially have to remove these links.

It would be very challenging to create one record from several source EAC-CPF records for the same person –  to blend timelines, or sort out relationships listed in different records, bearing in mind that it needs to be done in an automated way, keeping version control and dealing with revisions and new data coming in that might add to the name record.  How could we compare and blend two lists of relationships? Or two chronologies? We’d probably end up having to keep them all, and then potentially have similar but different relationships and chronologies, giving a slightly confused user experience.

If we do ingest records like the one above, we will have to figure out how these  more detailed records will relate to what we have already created.  If, as planned, we have one generic name record for a person, it makes the job easier, as we won’t be looking to make any one EAC-CPF record into the main name record, we will simply link to it from the main record. Bear in mind, our main record is intended to be a domain-neutral entry – linking to other sources beyond archives.  EAC-CPF records might do this to some extent, but they are unlikely to link to the Jisc Library Hub, and probably won’t link to Wikidata, or other external sources.   They are far more likely to provide internal links to the archive catalogue they relate to.

Arguably, it might be easier to forget about creating name records ourselves (from the catalogue entries) and just work with name records that have been created by our contributors (which are likely to be well-structured and include life dates). But if we do that, the pot of names will grow slowly, as only a small proportion of repositories create name records. We can’t realistically give the end user a few thousand name records covering maybe 1-2% of our names – they might search for ‘Winston Churchill’ as a name, and find that we don’t have him!  It would not remove the problem of name matching, and it would make the whole idea of reaching out beyond the archive domain, by linking into other resources using our names as the hook, rather ineffectual.

Therefore, we propose to keep the separate name records in our system We propose to create a ‘generic record’, which is what would be prominent in the Archives Hub display. We would then have the potential to link the records together, to blend them,  to try some text mining and analysis techniques. It gives us options.  It would not be sensible to make those decisions now. It is better to lay the groundwork that enables us to be flexible.   This approach allows us to link to an individual name record where we don’t feel able to confirm a ‘same as’ relationship. It presents the option to the end user – here is a name – we think this is the same person, so we’ve provided a link.

The end user experience needs to make sense and not mislead or provide false information. Links to brief name records could seem confusing, but, as I have said, trying to bring together in one record all the information from several name records, with  their biographies, relationships, aliases, events, related resources, is likely to be a nightmare.  In the end, it will take a good deal more testing and working with researchers to work out what is best.

 

Archives Hub Names Project (2): Biographical History

It is a somewhat vexed question how to treat biographical and administrative history (in this post I’ll focus on biographical history).  This is an ISAD(G) field and an EAD field. ISAD defines it as providing “an administrative history of, or biographical details on, the creator (or creators) of the unit of description to place the material in context and make it better understood”.  It advises for personal names to include “full names and titles, dates of birth and death, place of birth, successive places of domicile, activities, occupation or offices, original and any other names, significant accomplishments, and place of death”.

On the Archives Hub we have a whole range of biographical histories – from very short to very comprehensive.  I have had conversations with archivists who believe that ‘putting the collection in context’ means giving information that is particularly relevant for that archive rather than giving a general history. Conversely, many biographical history entries do give a very full biography, even if the collection only relates to one aspect of a person’s life and work. They may also include information that is not readily available elsewhere, as it may have been discovered as part of the cataloguing process.

The question is, if we create a generic name record for a person, how do we treat this biographical information? There are a number of alternatives.

(1) Add all biographical history entries to the record

If you look at a SNAC example:  https://snaccooperative.org/view/54801840 you can see that this is the approach. It has merits – all of the biographical information is brought together. But it can mean a great deal of repetition, and the ordering of the entries can seem rather illogical, with short entries first and then longer comprehensive entries at the end.

Whilst most biographical history entries are pretty good, it also means a few not very helpful entries may be included, and may be top of the order. In addition, putting all the entries in together doesn’t always seem to make much sense. In the example below there are just three short entries for a major figure in women’s liberation. They are automatically brought in from the catalogue entry for individual collections. Sometimes the biographical entries in individual catalogues suffer from system migration and various data processing issues that mean you end up with field contents that are not ideal.

Millicent Garrett Fawcett biographical histories in SNAC

The question is whether this approach provides a useful and effective end user experience.

Where there is one entry for a creator, with one biographical history, there is no issue other than whether the entry makes sense as an overall biographical entry for that person or organisation. But we have to consider the common situation where there will be a dozen or more entries. Even if we start with one entry, others may be added over time.  Generally, there will be repetition and information gaps, but in many cases this approach will provide a good deal of relevant information.

(2) Keep the biographical history entries with the individual name records

At the moment our plan is to create individual name records for each person, as well as a generic master record.  We haven’t yet worked out the way this might be presented to the end user.  But we could keep the biographical histories with the individual entries we have for names. The generic record would link to these entries, and to the information they contain.  This makes sense, as it keeps the biographical histories separate, and within the entries they were written to accompany. Repetition is not an issue as it is clear why that might happen.  But the end user has to go to each entry in turn to read this information.

(3) Keep biographical history entries with individual name records, but enable the information to be viewed in the generic master record

We have been thinking about giving the end user the option to ‘click to see all biographical histories created for this person’. That would help with expectations. Simply presenting a page with a dozen similar biographical histories is likely to confuse people, but  enabling them to make a decision to view entries gives us more opportunity for explanation – the link could include a brief explanatory note.

(4) Select one biographical history to be in the generic record

We have discussed this idea, but it is really a non-starter. How do you select one entry? What would the criteria be if it is automated? The longest?

(5) Link to a generic biography if available

This is the idea of drawing in the wikipedia entry for that person or organisation, or potentially using another source.  There is a certain risk to pulling in data from an external source as the ‘definitive’ biographical information, but it the source would always be cited, and it does start to move towards the principle of bringing different sources of information together. If we want to create a more generic resource, we are going to have to take risks with using external sources.

 

I would be interested in any comments on this.

Names Project (1): Creation of name records

The Archives Hub Names Project

The Archives Hub team and Knowledge Integration, our system suppliers, are embarking upon a short four month project to start to lay the groundwork, define the challenges and test the approaches to presenting end users with a name-based means to search, and connect to a broad range of resources related to people and organisations.  I will be blogging about the project as we go along.

Our key aims in the long-term are:

  • To provide the end user with a way to search for people and organisations and find a range of material relevant to their research
  • To enable connections to be made between resources within and external to Jisc, using names as the main focus
  • To bring archive collections together in an intellectual sense and provide different contexts to collections by creating networks across our data

This first project will not create an end-user interface, but will concentrate on processing,  matching names and linking resources. We want to explore how this can be administered in order to be sustainable over time.  In the end, the most challenging part of working with the names we have is identification, disambiguation and matching.  The aim is to explore the space and start to formulate a longer-term plan for the full implementation of names as entities within the Archives Hub.

Creation of name records from EAD description records

NB: This blog often refers to personal names for convenience, but names include personal, family and corporate entities.

EAD includes namesEAD descriptions include personal, family and corporate names.  These ‘entities’ may be listed as archival creators and also associated with the collection as index terms. Archival creators may optionally be given biographical or administrative histories.  The relationship of the collection with names in the index is not made explicit in the description (in a structural way), though it may often be gleaned from the descriptive information within the EAD record.

Creating name records for all names

We are proposing to begin by creating name records for all of these entries, no matter how thin the information for each entry may be.

Here is a random selection of names that are included in Archives Hub records:

Grote, Arthur
Gaskell, Arthur
Wilson, John
Thatcher, J. Wells, Barrister at Law
Barron, Margaret
Stanley, Catherine, 1792-1862
Roe, Alfred Charles
Rowlatt, Mary, b 1908
Milligan, Spike, 1918-2002
Fawcett, Margaret, d. 1987
Rolfe, Alan, 1908-2002 actor
Mayers, Frederick J (fl 1896-1937 : designer : Kidderminster, England)
Joan

Only a percentage of names have life dates. Some have born or death dates, some floruit dates.

Of course, the life dates, occupations and outputs of many people are not known, or may be very difficult to find.  Also, life dates will change when a birth date is joined by a death date. Epithets may also change over time (and they are not controlled vocabulary anyway).

In addition, we have inverted and non-inverted names on the Archive Hub, names with punctuation in different places, names with and without brackets, etc.  These issues create identification challenges.

Even taking names as creators and names as index terms within one single description, the match is often not exact:

Millicent Garrett Fawcett (creator name)
Fawcett, Dame Millicent. (1847-1929) nee Garrett, Feminist and Suffragist (index term)

Lingard, Joan (creator name)
Lingard, Joan Amelia, 1932- (index term)

The archival descriptions on the Archives Hub vary a great deal in terms of the structure, and different repositories have different approaches to cataloguing.  Some do not add name of creator, some do not add index terms, some add them intermittently, and often the same name is added differently for different collections within the same repository.  In many cases the cataloguer does not add life dates, even when they are known, or they are added to the name as creator but not in the index list, or vice versa. This sounds like a criticism, but the reality is that there are many reasons why catalogues have ended up as they are.

There has not been a strong tradition amongst archivists of adding names as unique identifiable entities, but of course, it has only been in the last few decades that we have had the potential, which is becoming increasingly sophisticated, of linking data through entity relationships, and creating so much more than stand-alone catalogue records. Many archivists still think primarily in terms of human readable descriptions.  Some people feel that with the advent of Google and sophisticated text analysis, there is no need to add names in this structured way, and there is no need for index terms at all.  But in reality search engines generally recommend structured data, and they are using it in sophisticated ways.  Schema.org is for structured data on the web, an initiative started by Google, Microsoft, Yahoo, and Yandex. Explicit markup helps search engines understand content and it potentially helps with search engine optimisation (ensuring your content surfaces on search engines).  Also, if we want to move down the Linked Data road, even if we are not thinking in terms of creating strict RDF Linked Data, we need to identify entities and provide unique identifiers for them (URLs on the web). Going back to Tim Berners-Lee’s seminal Linked Data article from 2006:

“The Semantic Web isn’t just about putting data on the web. It is about making links, so that a person or machine can explore the web of data.  With linked data, when you have some of it, you can find other, related, data.”

So, including names explicitly provides huge potential (as well as subjects, places and other entities) and it has become more important, not less important. Indeed, I would go so far as to say that structured data is more important than standards compliant data, especially as, in my experience, standards are often not strictly adhered to, and also, they need constant updating in order to be relevant and useful.

The idea with our project is that we start with name records for every entity – a pot of data we can work with. We may create Encoded Archival Context (Corporate Bodies, Persons and Families), otherwise known as EAC-CPF…but that is not important at this stage.  EAC is important for data ingest and output, and we intend to use it for that purpose, so it will come into the picture at some point.

The power of the anonymous

There are benefits in creating name records for people who are essentially anonymous or not easily identifiable.  Firstly, these records have unknown potential; they may become key to making a particular connection at some point, bearing in mind that the Archives Hub continually takes new records in. Secondly, we can use these records to help with identification, and the matching work that we undertake may help to put more flesh on the bones of a basic name record.  If we have ‘Grote, Arthur’ and then we come across ‘Grote, Arthur, 1840-1912’, we can potentially use this information and create a match. Of course, the whole business of inference is a tricky thing – you need more than a matching surname and forename to create a ‘same as’ relationship (I won’t get into that now). But the point is that a seemingly ‘orphan’ name may turn out to have utility. It may, indeed, provide the key to unlocking our understanding of particular events – the relationships and connections between people and other entities are what enable us to understand more about our history.

Components of a name record

So, all names will have name records, some with just a name, some with life dates of different sorts, some with biographical or administrative histories. The exception to this may be names that are not identifiable as people or organisations.  It is potentially possible to discover the type of entity from the context, but that is a whole separate piece of work.  Hundreds of names on the Archives Hub are simply labelled as ‘creator’ or ‘name’. This is down to historical circumstance – partly the Archives Hub made errors in the past (our old cataloguing tool which entered creators as simply EAD ‘origination’), partly other systems we ingest data from.  At the moment, for example, we are taking in descriptions from Axiell’s AdLib system, but the system does not mark up creator names as people or organisations (unless the cataloguer explicitly adds this), so we cannot get that information. This is probably a reflection of a time when semantically structured data was simply less important. If a human reads ‘Elizabeth Gaskell’ in a catalogue entry they are likely to understand what that string means; if undertaking large-scale automated processing, it is just a string of characters, unless it includes semantic information.

From the name records that we create, we intend to develop and run algorithms to match names. In many cases, we should be able to draw several names together, with a ‘same-as’ relationship. Some may be more doubtful, others more certain. I will talk about that as we get into the work.

At the moment, we have some ideas about how we will work with these individual records in terms of the workflow and the end user experience, but we have not made any final decisions, and we think that what is most important at this stage is the creation and experimentation with algorithms to see what we can get.

Master name records

We intend to create master records for people and organisations. The principle is to see these master records not as something within the archives domain, but as stand-alone records about a person or organisation that enable a range of resources to be drawn together.

So, we might have several name records for one person:

Example of master record, with various related information included:
Webb, Martha Beatrice, 1858-1943, social reformer and historian

Examples of additional name records that should link to the master record:
Webb, Beatrice, 1858-1943 (good match)
Webb, Martha Beatrice, 1858-1943, economist and reformer (good match)
Webb, Martha Beatrice, nee Potter, 1858-1943 (good match)
Webb, M.B. b. 1858 (possible match)
but…
Potter, Martha Beatrice, b 1858
…might well not be a match, in which case it would stand separately, and the archive connected to it would not benefit from the links being made.

We have discussed the pros and cons of creating master records for all names.  It makes sense to bring together all of the Beatrice Webb names into one master record – there is plenty that can be said about that individual; but does it make sense to have a master record for single orphaned names with no life dates and nothing (as yet) more to say about that individual?  That is a question we have yet to answer.

diagram showing link between archive, name records and master records
The archive is described though an EAD description held on our system (the CIIM). We take all the names from this to create a huge store of individual names. From this, we aim to create and update ‘definitive’ name records.

The principle is to have name records that enables us to create links to the Archives Hub entries and also to other Jisc services and resources beyond that – resources outside of the archives domain.  Many of these resources may also help us with our own identification and matching processes. It is important to benefit from the work that has already been done in this area.

We are looking at various name resources and assessing where our priorities will be.  This is a fairly short project, and we won’t have time to look at more than a handful of options. But we are currently thinking in terms of VIAF, ORCID and Wikidata. More on that to follow.

Personally, I’ve been thinking about working with names for several years. We have been asked about it quite a bit. But the challenge is so big and nebulous in many ways. It has not been feasible to embark upon this kind of work in the past, as our system has not supported the kind of systematic processing that is required. We are also able to benefit from the expertise K-Int can bring to data processing. It is one thing doing this as a stand-alone project; it is quite another to think about a live service, long term sustainability, version control and revisions, ingest from different systems, etc.  And also, to break it down into logical phases of work.  It is exciting, but it is going to involve a great deal of hard work and hard thinking.