How the Exploring British Design project informed the development of the Archives Hub

Back in 2014 the Archives Hub joined forces with The University of Brighton Design Archives for an exciting new project, funded by the Arts and Humanities Research Council, ‘Exploring British Design’ (EBD).

The project explored Britain’s design history by connecting design-related content in different archives, with the aim of giving researchers the freedom to explore around and within archives.

You can read a number of blog posts on the project, and there is also a video introducing the EBD website on You Tube, but in this post I wanted to set out how we have learned from the project and how it has informed the development of the new Archives Hub.

Unfortunately, we may not be able to maintain the website longer term, and so it seemed timely to reflect on how the principles used in this project are being taken forward.

Modelling the Data

A key component of EBD was our move away from the traditional approach of putting the archive collection at the centre of the user experience. Instead, we wanted to reflect the richness of the content – the people, organisations, places, subjects, events that a collection represents.

We had many discussions and filled many pieces of paper with ideas about how this might work.

rough ideas for data connectivity
Coming up with ideas for how EBD should work

We then took these ideas and translated them into our basic model.

model of data for EBD
Relationships between entities in the EBD data

Archives are represented on our model as one aspect of the whole. They are a resource to be referenced, as are bibliographic resources and objects. They relate to the whole – to agents, time periods, places and events. This essentially puts them into a whole range of contexts, which can expand as the data grows.

Screenshot of EBD homepage
Homepage of Exploring British Design: People are foremost.

The Exploring British Design website was one way to reflect the inter-connected model that we created.

We have taken the principles of this approach with the new Archives Hub architecture and website, which was launched back in December 2016. Whilst the archive collection description stays very much in the forefront of the users’ experience, we have introduced additional tabs to represent themed collections and repositories. All three of these sources of information are, in a data and processing sense, treated equally. The user searches the Hub and the search runs across these three data sources. The model allows us to be flexible with how we present the data, so we could also try different interfaces in future, maybe foregrounding images, or events.

screenshot of Archives Hub search results
Search for ‘design industry’ gives results across Archive Collections, Themed Collections and Repositories

Names

The EBD project had a particular focus on people. We opted to combine machine methods of data extraction – data taken partly from our already existent archive descriptions as well as from other external sources – with manual methods, to create rich records about designers. This manual approach is not sustainable for a large-scale service like the Archives Hub, but it shows what is possible in terms of creating more context and connectivity.

screenshot of a person page from the EBD website
EBD website showing a person page

We wanted to indicate that well-structured data allows a great deal more flexibility in presentation. In this case the ‘Archive and Museum Resources’ are one link in the list of resources about or related to the individual. We could have come up with other ways to present the information, given how it was structured.

We are intending to introduce names pages to the Archives Hub, which will then more clearly echo the EBD approach. They will largely have been created through automated processes, as we needed to create them at scale. They will generally be quite brief, without the ideal structure or depth, but the principle remains that we can then link from a person page to a host of related resources. The Hub website will have a new tab for ‘Names’ and end users will be able to run searches that take in collections, themes, repositories, people and organisations.

The EBD project allowed us to explore standards used for the creation of names data. It was our first experience of using Encoded Archival Context (Corporate Bodies, Persons and Families) (EAC-CPF), so we could start to see what we could do with it, as well as discover some of the shortcomings of the standard, as our data went beyond what is supported. For example, we wanted to link images to people and events but this was not covered by the standard. It was useful to have this preliminary exploration of it, and what it can – and can’t – do, as we look to adopt it for names within the Archives Hub.

Structured Data

One of the things the project did reinforce for me was the importance of indexing. On the Archives Hub we have always recommended indexing, but we have had mixed reactions from archivists, some feeling that it is less useful than detailed narrative, some saying that it is not needed ‘now we have Google’, some simply saying they don’t have time.

Indexing has many advantages, some of which I’ve touched on in various blog posts – and one at the top of the list, is that it brings the advantages of structured data. A name in a narrative can, in theory, be pulled out and utilised as a point of connectivity, but a name as an index term tends to be a great deal easier to work with: it is identified as a name, it usually has structured surname, forename content, it usually includes life dates and may include titles and epithets to help unambiguously identify an individual.

EBD was all about structured data, and we gave ourselves the luxury of adding to the data by hand, creating rich structured records about designers. This was partly to demonstrate what could be done in an interface, but we were well aware that it would be problematic to create records of that level of detail at scale. However, as we start to grapple with expanding name records in the Archives Hub, we have EBD as a reference point. It has helped us to think more about approaches and priorities when creating name records. If we were to create an EAC Editor (similar to our EAD Editor) we would think carefully about how to facilitate creating relationships. For example, the type of relationship – should there be a controlled list of relationship types? e.g. ‘worked with, collaborated with, had professional connection with, influenced by,  spouse of’ – these are some of the relationships we used in EBD, after much discussion about how best to approach this. Or would it be more practical to stick to ‘associated with’ (i.e. not defined), which is easier, but far less useful to a researcher. Could we have both? How would one combine them in an interface?  Another example – the potential to create timelines. If we wanted to provide end users with timelines, we would need to focus on time-bound events. There are many issues to consider here, not least of which is how comprehensive the timeline would be.

The vexed question of how to combine data from name descriptions created by several institutions is not something we really dealt with in EBD, but that will be one of the biggest challenges for us in aiming to implement name data on the Archives Hub.

The level of granularity that you decide upon has massive implications for complexity, resources and benefits. The more granular the data, the more potential for researchers to be able to drill down into lives, events, locations, etc. So including life dates allows for a search for designers from 1946; including places of education allows for exploring possible connections through education, but adding dates of education allows for a more specific focus still.

Explaining our approach

One thing that struck me about this project was that it was harder than I had anticipated to convey to people what we were trying to achieve and what we could achieve. I tended to find that showing the website raised a number of expectations that I knew would be difficult to fulfill, and if I’m being honest, I sometimes felt rather frustrated at the lack of recognition of what we had achieved – it’s really not easy to combine, process and present different data sources!  It is ironic that the more we press forwards with new functionality, and try to push the boundaries of what we do, the more it seems that people ask for developments that are beyond that!  You can try to modify expectations by getting deep down and technical with the challenges involved in aggregating and enhancing data created over time, by different people, in different environments (we worked with CSV data, EAC-CPF data, RDF and geodata for example), with different perspectives and priorities.  But detailed explanations of technical challenges are not going to work for most audiences. End users see and make an assessment of the website; they shouldn’t really need to be aware of what is going on behind the scenes.

Originally, in our project specification, we asked the question: “How can we encourage researchers, archive and museum professionals, and the public, to apprehend an integrated and extended rather than collection-specific sense of Britain’s design history?”  Whilst we did not go as far to answer this question as we had hoped, the work that we did made me feel that it might be harder than I had envisaged. People are very used to the traditional catalogues and other finding aids that are out there, and it creates a certain (possibly unconscious) mindset. I know this too well, because, as an archivist, I have had to adjust my own thinking to see data in a different way and appreciate that traditional approaches to cataloguing and discoverability are not always suited to the digital online age.

Data Model

The hierarchical approach to data is very embedded among archivists, and this is what people are used to being presented with.  Unless archivists catalogue in a different way, providing more structured information about entities (names, places, etc) then actually presenting things in a more connected way is hard.

image of hierarchical folders
A folder structure is often used to represent archival hierarchy

A more inter-connected model, which eschews linear hierarchy in favour of fluid entity relationships, and allows for a more flexible approach with the front-end interface to the data relies upon the quality, structure and consistency of the data. If we don’t have place names at all we can’t provide a search by place. If we don’t have place names that are unambiguously identified (i.e. not just ‘Cambridge’) then we can provide a search by place, but a researcher will be presented with all places called Cambridge, anywhere in the world (including the US, Australia and Jamaica).

A diagram showing archives and other entities connected
An example of connected entities

The new Archives Hub was designed on the basis of a model that allows for entities to be introduced and new connections made.

Archives Hub Entity Relationship diagram
Entities within the Archives Hub system

So, the tabs that the end user sees in the interface can be modified and extended over time. Searches can be run across all entities; it is not solely about retrieving descriptions of archives. This approach allows for researchers to find e.g. repositories that are significantly about ‘design’ or repositories that are located in London. It allows us to introduce Themed Collections as a separate type of description, so a student doing a project on ‘plastics’ would discover the Museum of Design in Plastics as a resource alongside archive collections at repositories including Brighton Design Archives, the V&A and the Paul Mellon Centre.

screenshot of Archives Hub search results
Search for ‘plastics and design’ shows archives and themed resources

Website Maintenance

One of the things I’ve learnt from this project is that you need to factor in the ongoing costs and effort of maintaining a project website. The EBD website is quite sophisticated, which means there are substantial technical dependencies, and we ended up running into issues with security, upgrades and compatibility of software, issues that are par for the course for a website but nonetheless need dealing with promptly. Maybe we should have factored this in more than we did, as we know the systems administration required for the Archives Hub is no small thing, but when you are in the throws of a project your focus is on the objectives and final output more than the ongoing issues. We cannot maintain a site long-term that is not being regularly used. EBD does not get the level of use that would justify the resources we would have to put into it on an ongoing basis.

Conclusion

When we were creating the model for the Archives Hub, we thought as much about flexibility and future potential as anything else. This is one thing that we have learnt from running the Hub for 25 years and from projects like Exploring British Design. You need to plan for potential developments in order to start to work with cataloguers, to get the data into the shape that you need it to be. We wanted to be able to introduce additional entities, so that we could have names, places, languages, images, or any other entities as ‘first class citizens‘ of the Hub. We wanted to be able to enhance the end user’s ability to take different paths, and locate relevant archives through different avenues of exploration.

We need to temper our ambitions for the Hub with the realities of cataloguing, aggregation and resources available, and we need as much information as we can get about what researchers really want; but this is why it is so important to encompass potential as well as current functionality. We may not be able to introduce everything we have envisioned or that users ask for right now; but it is important to understand the vital link between approaches to cataloguing, adherence to data standards, and front end functionality. We created visualisations for EBD and we would love to do this for the Hub, but it was not an easy thing to do, and so we would need to consider what the data allows, the software options available, whether the technical requirements are sustainable over time, and the effectiveness of the end result for the researcher.

Visualisation showing connections to Elizabeth Denby
Visualisation for Elizabeth Denby

When we demonstrated the visualisations in EBD, they had the wow factor that was arguably lacking in the main text-based site, but for serious researchers the wow factor is a great deal less important that the breadth and depth of the content, and that requires a model that is fundamentally rigorous, sustainable over time and realistic in terms of the data that you have to work with.

 

Exploring British Design: New Routes through Content

At the moment, the Archives Hub takes a largely traditional approach to the navigation and display of archive collections. The approach is predicated on hundreds of years of archival theory, expanded upon in numerous books, articles, conferences and standards. It is built upon “respect des fonds” and original order. Archival provenance tells us that it is essential to provide the context of a single item within the whole archive collection; this is required in order to  understand and interpret said item.

ISAD(G) reinforces the ‘top down’ approach. The hierarchy of an archive collection is usually visualised as a tree structure, often using folders. The connections show a top-down or bottom-up approach, linking each parent to its child(ren).

image of hierarchical folders
A folder structure is often used to represent archival hierarchy

This principle of archival hierarchy makes very good sense. The importance of this sort of context is clear: one individual letter, one photograph, one drawing, can only reveal so much on its own. But being able to see that it forms part of a series, and part of a larger collection, gives it a fuller story.

However, I wonder if our strong focus on this type of context has meant that archivists have sometimes forgotten that there are other types of context, other routes through content. With the digital environment that we now have, and the tools at our disposal, we can broaden out our ambitions with regards to how to display and navigate through archives, and how we think of them alongside other sources of information. This is not an ‘either or’ scenario; we can maintain the archival context whilst enabling other ways to explore, via other interfaces and applications. This is the beauty of machine processable data – the data remains unchanged, but there can be numerous interfaces to the data, for different audiences and different purposes.

Providing different routes into archives, showing different contexts, and enabling researchers to create their own narratives, can potentially be achieved through a focus on the ‘real things’ within an archive description; the people, organisations and places, and also the events surrounding them.

image of entities and links
Very simplified model of entities within archive descriptions and links between them

This is a very simplified image, intended to convey the idea of extracting people, organisations and places from the data within archive descriptions (at all levels of description). Ideally, these entities and connections can be brought together within events, which can be built upon the principle of relationships between entities (i.e. a person was at a place at a particular time).

Exploring British Design is a project seeking to probe this kind of approach. By treating these entities as an important part of the ‘networks of things’, and by finding connections between the entities, we give researchers new routes through the content and the potential to tell new stories and make new discoveries. The idea is to explore ways to help us become more fully a part of the Web, to ensure that archives are not resources in isolation, but a part of the story.

A diagram showing archives and other entities connected
An example of connected entities

 

For this project, we are focussing on a small selection of data, around British design, extracting entities from the Archives Hub data, and considering how the content within the descriptions can be opened up to help us put it into new contexts.

We are creating biographical records that can be used to include structured data around relationships, places and events.  We aim to extract people from the archive descriptions in which they are ‘embedded’ so that we can treat them as entities – they can connect not only to archive collections they created or are associated with, but they can also connect to other people, to organisations, to events, to places and subjects. For example, Joseph Emberton designed Simpsons in Piccadilly, London, in 1936. There, we have the person, the building, the location and the time.

With this paradigm, the archive becomes one of the ‘nodes’ of the network,  with the other entities equally to the fore, and the ability to connect them together shows how we can start to make connections between different archive collections. The idea is that a researcher could come into an archive from any type of starting point. The above diagram (created just as an example) includes ‘1970’s TV comedy’ through to the use of portland stone, and it links the Brighton Design Archive, the V&A Theatre and Performance Archive and the University of the Arts London Archive. The long term aim is that our endeavours to open up our data will ensure that it can be connected to other data sources (that have also been made open); sources outside of our own sphere (the Archives Hub data). The traditional interface has its merits; certainly we need to continue to provide archival context and navigation through collections; but we can be more imaginative in how we think about displaying content. We don’t need to just have one interface onto our data. We need to ensure that archives are part of the bigger story, that they can be seen in all sorts of contexts, and they are not relegated to being a bit part, isolated from everything else.

 

Linking Cultural Heritage Data

Last week I attended a meeting at the British Museum to talk with some museum folk about ways forward with Linked Data. It was a follow up to a meeting I organised on Archives and Linked Data,  and it was held under the auspices of the CIDOC Documentation Standards Working Group. The group consisted of me, Richard Light (Museum consultant), Rory McIllroy (ULCC), Jeremy Ottevanger (Imperial War Museum), Jonathan Whitson Cloud (British Museum), Julia Stribblehill (British Museum), and briefly able to join us was Pete Johnston (worked on the Locah project and now on the Linking Lives Linked Data project).

It proved to be a very pleasant day, with lots of really useful discussion. We spent some time simply talking about the differences – and similarities – in our perspectives and in our data. One of our aims was to start to create more links and dialogue between our sectors, and in this regard I think that the day was undoubtedly successful.

To start with our conversation ranged around various issues that our domains deal with. For example, we talked a bit about definitions, and how important they are within a museum context. For example, if you think about a collection of coins, defining types is key, and agreeing what those types are and what they should be called could be a very significant job in itself.  We were thinking about this in the context of providing authoritative identifiers for these types, so that different data sources can use the same terms.  Effectively identifying entities such as names and places are vital for museums, libraries and archives, of course, and then within the archive community we could also provide authoritative identifiers for things like levels of description. Workign together to provide authoritative and persistent URIs for these kinds of things could be really useful for our communities.

We talked about the value of promoting ‘storytelling’ and the limitations that may inhibit a more event-based approach. DBPedia (Wikipedia as Linked Data) may be at the centre of the Linked Data Cloud, but it may not be so useful in this context because it cannot chart data over time. For example, it can give you the population of Berlin, but it cannot give you the changing population over time. We agreed that it is important to have an emphasis on this kind of timeline approach.

We spent a little while looking at the British Museum’s departmental database, which includes some archives, but treats them more as objects (although the series they form a part of is provided, this contextual information is not at the fore – there is not a series description as such). The proposal is to find a way to join this system up with the central archive, maybe through the use of Linked Data.

We touched upon the whole issue of what a ‘collection’ is within the museum context, which is often more about single objects, and reflected on the challenge of how to define a collection, because even something like a cup and saucer could be seen as a collection…or it is one object?…or is a full tea set a collection?

For archivists, quite detailed biographical information is often part of the description of a collection. We do this in order to place the collection within a context. These biographical histories often add significant value to our descriptions, and sometimes the information in them may be taken from the archive collection, so new information may be revealed. Museums don’t tend to provide this kind of detail, and are more likely to reference other sources for the researcher to use to find out about individuals or organisations. In fact, referencing external sources is something archives are doing more frequently, and Linked Data will encourage this kind of approach, and may save us time in duplicating effort creating new biographical entries for the same person. (There is also the move towards creating separate name authorities, but this also brings with it big challenges around sharing data and using the same authorities).

We moved on to talk about Linked Data more specifically, and thought a bit about whether the emphasis should be on discovery or the quality and utility of what you get when you are presented with the results. We generally felt that discovery was key because Linked Data is primarily about linking things together in order to make new discoveries and take new directions in research.

book showing design patterns
Wellcome Library, London

One of the main aims of the day was to discuss the idea of the use of design patterns to help the cultural heritage community create and use Linked Data. It would facilitate the process of querying different graphs and getting reasonably predictable information back, if we could do things in common where possible. Richard has written up some thoughts about design patterns from a museum perspective and there is a very useful Linked Data Patterns book by Leigh Dodds and Richard Davis. We felt this could form a template for the sort of thing that we want to do. We were well aware that this work would really benefit from a cross-domain approach involving museums, archives, libraries and galleries, and this is what we hope to achieve.

We spoke briefly about the value of something like OpenCalais and wondered whether a cultural heritage version of this kind of extraction tool would be useful. If it was more tailored for our own sectors, it may be more useful in creating authorities, so that we can refer to things in a common way, as a persistent URL would be provided for the people, subjects, concepts, that we need to describe. We considered the scenario that people may go back to writing free text and then intelligent tools will extract concepts for them.

We concluded that it would be worth setting up a Wiki to encourage the community to get involved in exploring the idea of Linked Data Patterns. We thought it would be a good idea to ask people to tell us what they want to know – we need the real life questions and then we can think about how our data can join up to answer those questions. Just a short set of typical real-life questions would enable us to look at ways to link up data that fit a need, because a key question is whether existing practices are a good fit for what researchers really want to know.

Creating name authorities

We are currently evaluating ICA-AtoM, and it is throwing up some really useful ideas and some quite difficult issues because it supports the creation of name authorities, and it also automatically creates authority records for any creator name and any names within the access points when you upload an archive description.

As far as I am concerned, the idea of name authorities is to create a biographical entry for a person or organisation, something that gives the researcher useful information about the entity, different names they use, significant events in their life, people they know, etc (all set out within the ISAAR(CPF) standard (PDF)). You can then link to the archive collections that relate to them, thus giving the archives more context. You can distinguish between archives that they ‘created’ (were responsible for writing or accumulating), and archives that they are a subject of (where they are referenced – maybe as an access point in the index terms).

The ideal vision is for one name authority record for an entity, and that authority record intellectually brings together the archives relating to the entity, which is a great advantage for researchers. However, this is not practically achievable. Just the creation and effective use of name authorities to bring archives together at any level is quite challenging. For the Hub we have a number of issues….

Often we have more than one archive collection for the same entity (let’s say person, although it could be a corporate body or family). I’ll take Martha Beatrice Webb as an example, because we have 9 collections on the Archives Hub where she is a creator. She’s also topical, as the LSE have just launched a new Digital Library with the Beatrice Webb diaries!

beatrice webb collections

There is nothing wrong with any of these entries. In fact, I pick Beatrice Webb as an example because we do have some really good, detailed biographical history entries for her – exactly what we want in a collection description, so that researcher can get as much useful information as possible. We have always recommended to our contributors that they do provide a biographical history (we know from researchers that this is often useful to them; although conversely it is true to say that some researchers see it as unnecessary).

We have nine collections for Beatrice Webb, and three repositories holding these collections.

If we were to try to create one authority record for these nine collections:

1. Which biographical history would we use? Would we use them all, which means a lot of information and quite a bit of duplication? Would we pick the longest one? What criteria could we use?

2. Two of these collections are attributed to Beatrice and Sydney Webb. Ideally, we would want to link the collections to two authority records – one for Beatrice and one for Sydney, but the biographical history is for both of them, so we would probably end up with an authority record for ‘Beatrice and Sydney Webb’ as an entity.

3. The name of creator is entered differently in the descriptions – ‘Beatrice Webb’ and ‘Webb, Martha Beatrice, 1858-1943, wife of 1st Baron Passfield, social reformer and historian’. This is very typical for the Hub. Sometimes the name of creator is entered simply as forename and surname, but sometimes the same elements as are used for the index term entry are used – sometimes in the same (inverted) order and sometimes not. This makes it harder to automatically create one authority record, because you have to find ways to match the names and confirm they are the same person.

4. Maybe it would be easier to try to create one authority record per repository, so for Beatrice Webb we would have three records, but this would go against the idea of using name authorities to help intellectually bring archives together and still leaves us with some of the same challenges.

5. How would we deal with new descriptions? Could we get contributors to link to the authority records that have been created? Would we ask them to stop creating biographical histories within descriptions?

6. The name as an access point is often even more variable than the name as creator, although it does tend to have more structure. How can we tell that ‘Martha Beatrice Webb’ is the same person as ‘Beatrice Webb’ is the same person as ‘Beatrice Potter Webb’? For this we’ll have to carry out analysis of the data and pattern matching. For some names this isn’t so difficult, such as for Clough Williams-Ellis and Clough Williams Ellis.  But what about ‘Edward Coles’ (author of commonplace book c. 1730-40) and ‘Edward Coles fl. 1741’? With archives, there is a good deal of ‘fl’ and ‘c’ due to the relative obscurity of many people within archive collections. If we don’t identify matches, then we will have an authority record for every variation in a name.

7. We will have issues where a biographical history is cross-referenced, as in one example here. It makes perfect sense within the current context, and indeed, it follows the principle of using one biographical history for the same person, but it requires a distinct solution when introducing authority records.

I would be really interested to know what archivists, especially Hub contributors, think about linking to authority files.  I am very excited about the potential, and I believe there is so much great information tied up in biographical and administrative history fields that would help to make really useful authority records, but the challenges are pretty substantial. Some questions to consider:

(1) Would you be happy to link to a ‘definitive’ authority record? What would ‘authority’ and ‘definitive’ mean for you?

(2) Would you like to be able to edit the ‘definitive’ record – maybe you have further information to add to it? Would this type of collective authorship work?

(3) Would you rather have one authority record for each repository?

(4) Would you rather have room within an archive description to create a biographical entry that reflected that particular archive?

(5) Would you like to see co-ordination of the creation and use of name authorities? Maybe it’s something The National Archives could lead, following their work in maintaining names through the National Register of Archvies?

I’m sure you can think of other questions to ask, and maybe you have some questions to ask of us about our review? Suffice to say that we have no plans to change our software – this is currently just an assessment, but we are seriously thinking of how we can start to incorporate name authorities into the Archives Hub.

Finally, its worth mentioning that if you are interested in ways to bring biographical histories together, you might like to follow our ‘Linking Lives’ linked data project through the Linking Lives blog, as for this project we are looking at providing an interface that gives then end user something a little like a name authority record, only it will include more external links to other content.