Who is the creator?

I am currentphoto of quill pensly working on an exciting new Linked Data project, looking at exposing the Archives Hub metadata in a different way, that could provide great potential for new uses of the data. More on that in future posts. But it has got me thinking about the thorny issue of ‘Name of creator(s)’, as ISAD(G) says. The ‘creator’ of the archive. In RDF modelling (required for Linked Data output) we need to think about how data elements relate to eachother and be explicit about the data elements and the relationships between concepts.

Dublin Core has a widely used ‘createdBy’ element – it would be nice and easy to use that to define the relationship between the person and the archive. The ‘Sir Ernest Shakleton Collection’ createdBy Sir Ernest Shakleton. There is our statement. For RDF we’ll want to identify the names of things with URIs, but leaving that for now, what I’m interested in here is the predicate – the collection was created by Sir Ernest Shakleton, an Arctic explorer whose papers are represented on the Hub.

The only trouble with this is that the collection was not created by him. Well, it was and it wasn’t. The ‘collection’ as a group of things was created by him. That particular group of things would not exist otherwise. But people will usually take ‘created by’ to mean ‘authored by’. It is quite possible that none of the items in the collection were authored by Sir Ernest Shakleton. ISAD(G) refers to the ‘creation, accumulation and maintenance’ and uses ‘creator’ as shorthand for these three different activities. EAD uses ‘origination’ for the ‘individual or organisation responsible for the creation, accumulation or assembly of the described materials’. Maybe that definition is more accurate because it says ‘or assembly’. The idea of an originator appears to get nimbly around the fact that the person or organisation we attribute the archive to is not necessarily the author – they did not necessary create any of the records. But the OED defines the originator as the person who originates something, the creator.

It all seems to hang upon whether the creator can reasonably mean the creator of this archive collection – they are responsible for this collection of materials coming together. The trouble is, even if we go with that, it might work within an archival context – we all agree that this is what we mean – but it doesn’t work so well in a general context. If our Linked Data statement is that the Sir Ernest Shakleton collection ‘was created by’ Sir Ernest Shakleton then this is going to be seen, semantically, as the bog-standard meaning of creator, especially if we use a vocabulary that usually defines creator as author. Dublin Core has dc:creator. Dublin Core does not really have the concept of an archival originator, and I suspect that there are no other vocabularies that have addressed this need.

I would like to end this post with an insightful solution…but none such is coming to me at present. I suppose the most accurate one word description of the role of this person or organisation is ‘accumulator’ or ‘gatherer’. But something doesn’t sound quite right when you start talking about the accumulator. Sounds a bit like a Hollywood movie. Maybe gives it a certain air of mystery, but for representing data in RDF we need clarity and consistency in the use of terms.


  1. This is a very interesting topic – got me thinking and apologies for such a long post!

    This is one of the many questions that illustrates that archivists have done much less serious modelling of the ‘archival’ domain that our colleagues in other sectors which is something (I hope) the ICA may well put right in the medium term.

    In lieu of such work, we struggled with these issues of creation and how it relates to other roles actors play in relation to material when designing the data model and cataloguing guidelines for the Library’s new cataloguing system for archives and manuscripts. Our data model describes entities (archives, manuscripts, collections and their parts, persons, corporate bodies etc) and the relationships between them – so while we have yet to export as RDF in a linked data context, I’m hoping it will be relatively easy to do so and (where supported by the data) we may be able to represent more granular relationships than ‘flat’ cataloguing based on ISAD(G) alone.

    Our thinking is still evolving but we still ask for the creator(s) to be defined for each fonds/collection (that is a relationship to be made from the highest level description to a cpf entity with the role of ‘creator’). When asked by non-archivally trained colleagues what ‘creator’ means in this context I fall back on ISAD(G)’s fudge (‘created, accumulated, and/or maintained’) and often qualify it with the word ‘archival’ to try and get at the ‘archivyness’ of this relationship (that is that the relationship is central to what makes this group of stuff archival rather than any old group of ‘collected’ stuff).

    We ask that relationships to creators should be captured at the ‘highest level that applies’ and the nature of the descriptive hierarchy (as suggested by others) may provide some help here. So generally series are covered by some or all of the creators given at collection level and if only a sub-set apply then we ask that these alone are repeated at the series level. This logical and also allows for those that see the primacy for the series and regard the fonds as an unhelpful concept – but that’s another issue. At the object levels though we give our cataloguers free reign to describe all those roles associated with the ‘making’ of things (author, illuminator, scribe etc) which pleases our manuscript librarians. We specify the MARC relators (http://www.loc.gov/marc/relators/relaterm.html) as our vocabulary for the role terms, although not conforming strictly perhaps to the definition of ‘creator’ (which is confined to the intellectual and artistic content of the work). In effect then we are saying that our different descriptive entities (collection/fonds, series, object) will tend to gather relationships to (possibly the same) actors with different roles and in a linked data world I guess that’s fine.

    We are still fudging the word ‘creator’ though and (as again suggested) this will remain to be the case as this is all our legacy data will support. Indeed while we have our flexible model at the Library, it may still be a struggle to get those creating new catalogues to use it effectively especially as there is a feeling (one I try hard to dispel) that thinking more precisely about roles played at different levels means more work! With ‘born-digital’ material though, it may be that we do have the data about whether and by whom material was ‘created’ or ‘accumulated’ or ‘maintained’ etc in which case we may be able to lose the ISAD(G) fudge of ‘(archival) creator’.

    Given then that LOCAH will be dealing with legacy data then I agree that using DC Creator is fine. Indeed picking up on Pete’s point, I’m not sure we need a particular ‘archival’ creator to be defined as the point here surely is that in order to understand the ‘archival’ (evidential) nature of material users need to understand the context of its creation and use and while broad brush the capture and display of a relationship in a linked world will in itself do just that!

  2. Hi Kathy,
    Yes, ‘collected by’ works in some ways but may not reflect the full truth about the archive. I also feel like it might move us further away from the fact that the archive is very often largely about the ‘creator’.

    Owen has a good point – I’m starting to think that the problems of clarifying meaning within an archive description will just create more challenges, when it might be best to stick with imperfect concepts but just get on with creating Linked Data. I guess you have to really think about what is beneficial to researchers. ‘Creator’ probably does have the right weight to it. It certainly suggests active involvement!

  3. I don’t think that (necessarily) just because we are moving to express this as linked data we have to be any more semantically correct (although I can understand the temptation).

    We’ve got data created in a particular way – ISAD(G). It doesn’t seem to me that ISAD(G)’s concept of a ‘creator’ is really that precise, and so DC Creator would seem to do. My argument would be that it will be at least as intelligible within context as the current ISAD(G) Creator. For me the fuzziness of DC Creator is a strength not a weakness.

    If it turns out that some future use of the data requires a more precise expression of the relationship between the person/corporation who created/accumulated the material in the archive, this can be expressed later. I can’t see we lose data or context by using DC Creator?

  4. Hi John,
    Yes, I agree that we need to address this issue even more as we move towards more machine-processing within a broader environment. To my mind, RDF data modelling requires us to do so. Leastways it does if we want to aim for soemthing that is as semantically correct as it can be. When I think about the archvies I’ve accessioned, rescued from destruction and taken from relations and friends of the ‘creator’, it is all a little messy most of the time. I absolutely take your point about the question of who is the creator – there are often archvies from offices that are ‘attributed’ to individuals and we tend to like to have one creator, which in itself is problematic.

    I think it will be impossible to have a right and wrong, but I hope that we (archivists) can make a definite choice and define the choice that we make.

  5. Hi Jane (and Pete),
    I’m pleased to see that the issue of creator as collector/gatherer vs creator as author has been raised. (Another shades of “creator” (in relation to series in particular) could be recipient.)

    This question of teasing out different meanings of a particular attribute is an important result of applying more rigorous modelling to enable the machine-processing and inferencing associated with linked data. The problem for most archives is that our existing descriptive records are full of ambiguities and mixed meanings.

    On a related note, this line of inquiry also raises the question of relationships between different creators, especially around levels of aggregation, separation of roles and functions, and the like. Depending on collecting policies and other factors, different institutions may choose different entities to be the creator – all of them valid, and most times with relationships between them (individual v family; office v enterprise; elected office v individual office holder etc).

    One of my key messages here: modelling and description is about choices and perspective, not right and wrong.


  6. The other point I meant to add is that there’s no requirement to represent a “creator” relationship if it isn’t useful, or if the actual relationship represented in EAD is a different one.

    We should make use of existing RDF vocabularies where we can, but just because the Dublin Core vocabulary has a creator property and that is widely use it doesn’t follow that we should try to “squeeze” relationships into that if they don’t really fit, if you see what I mean (which i worry happens quite a lot when people use Dublin Core terms!) :)

  7. Hi Pete,
    Thanks for the comments. I’m interested to know that in Dublin Core ‘creator’ is used more broadly. I think an important point here is, as you say, whether you can talk about the collection, and therefore happily talk about the creator (maker) of the collection and it is understood that this is different from the creator of the letters, or the creator of the research notes, or whatever.

    Hmmm. I find myself thinking about the creator when you apply it to a series – what about the ‘creator’ of a series of correspondence? Can you still apply the idea of a maker/accumulator of the correspondence? Or does it start to feel more like the author?

Comments are closed.