Linked Data enthusiasts like to talk about making concepts within data into first-class citizens. This should appeal to archivists. The idea that the concepts within our data are equal sounds very democratic, and is very appealing for rich data such as archival descriptions. But, where does that leave the notion of the all important top-level archival collection description? Archivists do tend to treat the collection description as superior; the series, sub-series, file, item, etc., are important, but subservient to the collection. You may argue that actually they are not less important, but they must be seen in the context of the collection. But I would still propose that (certainly within the UK) the collection-level description generally tends to be the focus and is considered to be the ‘right’ way into the collection, or at least, because of the way we catalogue, it beomes the main way into the collection.
Linked Data uses as its basis the data graph. This is different from the relational model and the tree structure model. In a graph, entities are all linked together in such a way that none has special status. All concepts are linked, the links are specified – that is to say, the relationships are clarified. In a tree structure, everything filters down, so it is inevitable that the top of the tree does seem like the most important part of the data. A data graph can be thought of as a tree structure where links go both ways, and nothing is top or bottom. You could still talk about the collection description being the ‘parent’ of the series description, but the series description is represented equally in RDF. But, maybe more fundamentally than this, Linked Data really moves away from the idea of the record as being at the heart of things and replaces this with the idea of concepts being paramount. The record simply becomes one other piece of data, one other concept.
This type of modelling accords with the idea that users want to access the data from all sorts of starting points, and that they are usually interested in finding out about something real (a subject, a person) rather than an archive per se. When you model your data into RDF what you are trying to think about is exactly that – how will people want to access this data. In Australia, the record series is the preferred descriptive entry, and a huge amount has been written about the merits of this approach. It seems to me, with RDF, we don’t need to start with the collection or start with the series. We don’t need to start with anything.
This diagram, courtesy of Talis, shows part of a data graph for modelling information about spacecraft. You can see how the subjects (which are always represented by URLs) have values that may be literal (in rectangular boxes) or may point to other resources (URLs). Some of this data may come from other datasets (use of the same URL for a spacecraft enables you to link to a different resource and use the values within that resource).
The emphasis here is on the data – the concepts – not on the carrier of the data – the ‘record’.
In our LOCAH project we will need to look at the issue of hierarchy of multi-level descriptions. In truth, I am not yet familiar enough with Linked Data to really understand how this is going to work, and we have not yet really started to tackle this work. I think I’m still struggling to move away from thinking of the record as the basis of things, because, to coin a rather tiresome phrase, RDF modelling is a paradigm shift. RDF is all about relationships between concepts and I will be interested to see where this leaves relationships between hierarchical parts of an archive description. But I am heartened by Rob Styles’ (of Talis) assertion that RDF allows anyone to say anything about anything.
In terms of
x -> belongs to -> y
implying
y -> contains -> x
I think you would do this by making ‘belongs to’ the ‘inverse’ of ‘contains’ – I think the ‘Relationship’ ontology that Ian Davis and Eric Vitiello did shows how this might be done with (e.g.) childOf, parentOf – http://vocab.org/relationship/.html
I don’t think you’d need to add the triple
x -> belongs to -> z
as this is a relationship you can deduce – if x is part of y and y is part of z, then x must be part of z.
I played around with the relationship ontology and made some notes here http://www.meanboyfriend.com/overdue_ideas/2009/10/middlemash-middlemarch-middlemap/
This includes some comments on deduced relationships and some of the problems of relying on them in the example I was experimenting with.
I think that this is fine. As I’m blogging whilst I’m going along, I think I’m still trying to grasp the rather different idea of a data graph approach, where it doesn’t matter what you ‘start’ with – I’m so used to a more linear hierarchy.
We’d have to add something to your example if we wanted to complete the relationships:
x -> is an -> item
y -> is a -> series
z -> is a -> sub-fonds
x -> belongs to -> y
y -> belongs to -> z
x -> belongs to -> z
and how about the other way around?
x -> is part of -> y
y -> is part of -> z
x -> is part of -> z
I assume we can say an item belongs to a sub-series, series, sub-fonds and fonds. But I’m still not sure whether all those relationships are necessary to add as triples.
PS I’ve been trying to make this comments box bigger – sorry its so small!
I suspect this is where my current knowledge of RDF lets me down.
What is the problem with saying:
x -> is an -> item
y -> is a -> series
z -> is a -> sub-fonds
x -> belongs to -> y
y -> belongs to -> z
and so on?
Do ambiguities start to creep in?
Hi Owen,
Yes, it does make sense. We can model the data in such as way as to present hierarchies to the user. It doesn’t mean the model itself is hierarchical, but it does mean that the relationships you create allow for this type of perspective on the data – I assume that’s what you’re saying?
I still haven’t worked out how this applies to the hierarchy of an archive description. If the display needs to be able to show that ‘This is an item, part of a series, part of a sub-fonds, part of a collection’, the modelling does need to enable this. I’m not convinced that the ORE Aggregation is going to work for this. What do you think?
I’m also feeling my way with linked data to some extent, but I wonder if we need to make sure we don’t confuse how data is presented with how it is modelled.
In the example that Rob Styles uses of the spacecraft, we could approach the data from a geographical perspective, with an implied hierarchy:
The United States is a country
in which Florida is a state
in which Cape Canaveral is a town
in which NASA has a base
from which Apollo 11 was launched
Or we could consider it from the Space Mission/Organisation perspective
NASA is an organisation
which has Space Programmes
of which Apollo is one
in which Apollo 11 was a mission
or something like that (as I worked through the example I realised neither my knowledge of US geography nor of NASA space missions was up to it!)
Anyway – both of these hierarchical representations could (or should) be able to be constructed from the RDF – but the model doesn’t say one or the other. It would be down to how you presented the data to the consumer – that is in software, as opposed to in the model.
Does this make sense? As I said, I’m feeling my way here, so happy to be corrected.