Archival Context: entities and multiple identities

I recently took part in a Webinar (Web seminar) on the new EAC-CPF standard. This is a standard for the encoding of information about record creators: corporate bodies, persons and families. This information can add a great deal to the context of archives, supporting a more complete understanding of the records and their provenance.

We were given a brief overview of the standard by Kathy Wisser, one of the Working Group members, and then the session was open to questions and discussion.

The standard is very new, and archivists are still working out how it fits in to the landscape and how it relates to various other standards. It was interesting to note how many questions essentially involved the implementation of EAC-CPF: who creates the records? where are they kept? how are they searched? who decides what?
These questions are clearly very important, but the standard is just a standard for the encoding of ISAAR(CPF) information. It will not help us to figure out how to work together to create and use EAC-CPF records effectively.
In general, archivists use EAD to include a biographical history of the record creator, and may not necessarily create or link to a whole authority record for them. The idea is that providing separate descriptions for different entities is more logical and efficient. The principle of separation of entities is well put: “Because relations occur between the descriptive nodes [i.e. between archive collections, creators, functions, activities], they are most efficiently created and maintained outside of each node.” So that if you have a collection description and a creator description, the relationship between the two is essentially maintained separately to the actual descriptions. If only EAD itself was a little more data-centric (database friendly you might say), this would facilitate a relational approach.
I am interested in how we will effectively link descriptions of the same person, because I cannot see us managing to create one single authoritative record for each creator. This is enabled via the ‘identities’: a record creator can have two or more identities with each represented by a distinct EAC-CPF instance. I think the variety of identity relationships that the standard provides for is important, although it inevitably adds a level of complexity. It is something we have implemented in our use of the tag to link to related descriptions. Whilst this kind of semantic markup is a good thing, there is a danger that the complexity will put people off.
I’m quite hung-up on the whole issue of identifiers at the moment. This may be because I’ve been looking at Linked Data and the importance of persistent URLs to identify entities (e.g. I have a URL, you have a URL, places have a URL, things have a URL and that way we can define all these things and then provide links between them). The Archives Hub is going to be providing persistent URLs for all our descriptions, using the unique identifier of the countrycode, repository code and local reference for the collection (e.g., where 100 is the repository code and MSS is the local reference).
I feel that it will be important for ISAAR(CPF) records to have persistent URLs, and these will come from the recordID and the agencyCode. Part of me thinks the agency responsible for the EAC-CPF instance should not be part of the identifer, because the record should exist apart from the institution that created it, but then realistically, we’re not going to get consensus on some kind of independent stand-alone ISAAR(CPF) record. One of the questions I’m currently asking myself is: If two different bodies have EAC-CPF records, does it matter what the identifers/URLs are for those records, even if they are for the same person? Is the important thing to relate them as representing the same thing? I’m sure its very important to have a persistent URL for all EAC-CPF instances, because that is how they will be discoverable; that is their online identity. But the question of providing one unique identifier for one person, or one corporate body is not something I have quite made my mind up about.
It will be interesting to see how the standard is assessed by archivists and more examples of implementation. The Archives Hub would be very interested to hear from anyone using it.

A few thoughs on context and content

I have been reading with interest the post and comments on Mark Matienzo’s blog: He asks ‘Must contextual description be bound to records description?’

I tend to agree with his point of view that this is not a good thing. The Archives Hub uses EAD, and our contributors happily add very excellent biographical and administrative history information into their descriptions, via the tag, information that I am sure is very valuable for researchers. But should our descriptions leave out this sort of information and be just descriptions of the collection and no more? Wouldn’t it be so much more sensible to then link to contextual information that is stored separately?
Possibly, on the other side of the argument, if archivists created separate biographical/administrative history records, would they still want to contextualise them for specific collection descriptions anyway? It makes perfect sense to have the information separate to the collection description if it is going to be shared, but will archivists want to modify it to make it relevant to particular collections? Is it sensible to link to a comprehensive biographical record for someone when you are describing a very small collection that only refers to a year in their life?
Of course, we don’t have the issue with EAD at the moment, in so far as we can’t include an EAC-CPF record in an EAD record anyway, because it doesn’t allow stuff to be included from other XML schemas (no components from other namespaces can be used in EAD). But I can’t help thinking that an attractive model for something like the Archives Hub would be collection descriptions (including sub-fonds, series, items), that can link to whatever contextual information is appropriate, whether that information is stored by us or elsewhere. This brings me back to my current interest – Linked Data. If the Web is truly moving towards the Linked Data model, then maybe EAD should be revised in line with this? By breaking information down into logical components, it can be recombined in more imaginative ways – open and flexible data!