Archives Hub Data and Workflow

Introduction

As those of you who contribute to or use the Hub will know, we went live with our new system in Dec 2016.  At the heart of our new system is our new workflow.  One of the key requirements that we set out with when we migrated to a new system was a more robust and sustainable workflow; the system was chosen on the basis that it could accommodate what we needed.

This post is about the EAD (Encoded Archival Data) descriptions, and how they progress through our processing workflow. It is the data that is at the heart of the Archives Hub world. We also work with EAG (Encoded Archival Guide) for repository descriptions, and EAC-CPF (Encoded Archival Context, Corporate bodies, Persons and Families) for name entities. Our system actually works with JSON internally, but EAD remains our means of taking in data and providing data out via our API.

On the Archives Hub now we have two main means of data ingest, via our own EAD Editor, which can be thought of as ‘internal’, and via exports from archive systems, which can be thought of as ‘external’.

Data Ingest via the EAD Editor

1. The nature of the EAD

The Editor creates EAD according to the Archives Hub requirements. These have been carefully worked out over time, and we have a page detailing them at http://archiveshub.jisc.ac.uk/eadforthehub

screenshot of eadforthehub page
Part of a Hub webpage about EAD requirements

When we started work on the new system, we were aware that having a clear and well-documented set of requirements was key. I would recommend having this before starting to implement a new system! But, as is often the case with software development, we didn’t have the luxury of doing that – we had to work it out as we went along, which was sometimes problematic, because you really need to know exactly what your data requirements are in order to set your system up. For example, simply knowing which fields are mandatory and which are not (ostensibly simple, but in reality this took us a good deal of thought, analysis and discussion).

Screenshot of the EAD Editor
EAD Editor

2. The scope of the EAD

EAD has plenty of tags and attributes! And they can be used in many ways. We can’t accommodate all of this in our Editor. Not only would it take time and effort, but it would result in a complicated interface, that would not be easy to use.

screenshot of EAD Tag Library
EAD Tag Library

So, when we created the new Editor, we included the tags and attributes for data that contributors have commonly provided to the Hub, with a few more additions that we discussed and felt were worthwhile for various reasons. We are currently looking again at what we could potentially add to the Editor, and prioritising developments. For example, the <materialspec> EAD tag is not accommodated at the moment. But if we find that our contributors use it, then there is a good argument for including it, as details specific to types of materials, such as map scales, can be useful to the end user.

We don’t believe that the Archives Hub necessarily needs to reflect the entire local catalogue of a contributor. It is perfectly reasonable to have a level of detail locally that is not brought across into an aggregator. Having said that, we do have contributors who use the Archives Hub as their sole online catalogue, so we do want to meet their needs for descriptive data. Field headings are an example of content we don’t utilise. These are  contained within <head> tags in EAD. The Editor doesn’t provide for adding these. (A contributor who creates data elsewhere may include <head> tags, but they just won’t be used on the Hub, see Uploading to the Editor).

We will continue to review the scope in terms of what the Editor displays and allows contributors to enter and revise; it will always be a work in progress.

3. Uploading to the Editor

In terms of data, the ability to upload to the Editor creates challenges for us. We wanted to preserve this functionality, as we had it on the old Editor, but as EAD is so permissive, the descriptions can vary enormously, and we simply can’t cope with every possible permutation. We undertake the main data analysis and processing within our main system, and trying to effectively replicate this in the Editor in order to upload descriptions would be duplicating effort and create significant overheads. One of our approaches to this issue is that we will preserve the data that is uploaded, but it may not display in the Editor. If you think of the model as ‘data in’ > ‘data editing’ > ‘data out’, then the idea is that the ‘data in’ and ‘data out’ provides all the EAD, but the ‘data editing’ may not necessary allow for editing of all the data. A good example of this situation occurs with the <head> tag, which is used for section headings. We don’t use these on the Hub, but we can ensure they remain in the EAD and they are there in the output from the Editor, so they are retained, but not displayed in the Editor. They can then be accessed by other means, such as through an XML Editor, and displayed in other interfaces.

We have disabled upload of exports from the Calm system to the Editor at present, as we found that the data variations, which often caused the EAD to be invalid, were too much for our Editor to cope with. It has to analyse the data that comes in and decide which fields to populate with which data. Some are straightforward – ‘title’ goes into <unittitle> for example, but some are not…for example, Calm has references and alternative references, and we don’t have this in our system, so they cause problems for the Editor.

4. Output from the Editor

When a description is submitted to the Archives Hub from the Editor, it is uploaded to our system (CIIM, pronounced ‘sim’), which is provided by Knowledge Integration, and modified for our own data processing requirements.

Screenshot of the CIIM
CIIM Browse screen

The CIIM framework allows us to implement data checking and customised transformations, which can be specific to individual repositories. For the data from the Editor, we know that we only need a fairly basic default processing, because we are in control of the EAD that is created. However, we will have to consider working with EAD that is uploaded to the Editor, but has not been created in the Editor – this may lead to a requirement for additional data checking and transformations. But the vast majority of the time descriptions are created in the Editor, so we know they are good, valid, Hub EAD, and they should go through our processing with no problems.

Data Ingest from External Data Providers

1. The nature of the EAD

EAD from systems such as Calm, Archivist’s Toolkit and AtoM is going to vary far more than EAD produced from the Editor. Some of the archival management systems have EAD exports. To have an export is one thing; it is not the same as producing EAD that the Hub can ingest. There are a number of factors here. The way people catalogue varies enormously, so, aside from the system itself, the content can be unpredictable – we have to deal with how people enter references; how they enter dates; whether they provide normalised dates for searching; whether entries in fields such as language are properly divided up, or whether one entry box is used for ‘English, French, Latin’, or ‘English and a small amount of Latin’; whether references are always unique; whether levels are used to group information, rather than to represent a group of materials; what people choose to put into ‘origination’ and if they use both ‘origination’ and ‘creator’; whether fields are customised, etc. etc.

The system itself will influence on the EAD output. A system will have a template, or transformation process, that maps the internal content to EAD. We have only worked in any detail with the Calm template so far. Axiell, the provider of Calm, made some changes for us, for example, only six languages were exporting when we first started testing the export, so they expanded this list, and then we made additional changes, such as allowing for multiple creators, subjects and dates to export, and ensuring languages in Welsh would export. This does mean that any potential Calm exporter needs to use this new template, but Axiell are going to add it to their next upgrade of Calm.

We are currently working to modify the AdLib template, before we start testing out the EAD export. Our experience with Calm has shown us that we have to test the export with a wide variety of descriptions, and modify it accordingly, and we eventually get to a reasonably stable point, where the majority of descriptions export OK.

We’ve also done some work with AtoM, and we are hoping to be able to harvest descriptions directly from the system.

2. The scope of the EAD

As stated above, finding aids can be wide ranging, and EAD was designed to reflect this, but as a result it is not always easy to work with. We have worked with some individual Calm users to extend the scope of what we take in from them, where they have used fields that were not being exported. For instance, information about condition and reproduction was not exporting in one case, due to the particular fields used in Calm, which were not mapping to EAD in the template. We’ve also had instances of index terms not exporting, and sometimes this had been due to the particular way an institution has set up their system. It is perfectly possible for an institution to modify the template themselves so that it suits their own particular catalogues, but this is something we are cautious about, as having large numbers of customised exports is going to be harder to manage, and may lead to more unpredictable EAD.

3. Uploading to the Editor

In the old Hub world, we expected exports to be uploaded to the Editor. A number of our contributors preferred to do this, particularly for adding index terms. However, this lead to problems for us because we ended up with such varied EAD, which mitigated against our aim of interoperable content. If you catalogue in a system, export from that system, upload to another system, edit in that system, then submit to an aggregator (and you do this sometimes, but other times you don’t), you are likely to run into problems with version control. Over the past few years we have done a considerable amount of work to clarify ‘master’ copies of descriptions. We have had situations where contributors have ended up with different versions to ours, and not necessarily been aware of it. Sometimes the level of detail would be greater in the Hub version, sometimes in the local version. It led to a deal of work sorting this out, and on some occasions data simply had to be lost in the interests of ending up with one master version, which is not a happy situation.

We are therefore cautious about uploading to the Editor, and we are recommending to contributors that they either provide their data directly (through exports) or they use the Editor. We are not ruling out a hybrid approach if there is a good reason for it, but we need to be clear about when we are doing this, what the workflow is, and where the master copy resides.

4. Output from Exported Descriptions

When we pass the exports through our processing, we carry out automated transformations based on analysis of the data. The EAD that we end up with – the processed version – is appropriate for the Hub. It is suitable for our interface, for aggregated searching, and for providing to others through our APIs. The original version is kept, so that we have a complete audit trail, and we can provide it back to the contributor. The processed EAD is provided to the Archives Portal Europe. If we did not carry out the processing, APE could not ingest many of the descriptions, or else they would ingest, but not display to the optimum standard.

Future Developments

Our automated workflow is working well. We have taken complete, or near complete,  exports from Calm users such as the Universities of Nottingham, Hull and (shortly) Warwick, and a number of Welsh local authority archives. This is a very effective way to ensure that we have up-to-date and comprehensive data.

We have well over one hundred active users of the EAD Editor and we also have a number of potential contributors who have signed up to it, keen to be part of the Archives Hub.

We intend to keep working on exports, and also hope to return to some work we started a few years ago on taking in Excel data. This is likely to require contributors to use our own Excel template, as it is impractical to work with locally produced templates. The problem is that working with one repository’s spreadsheet, translating it into EAD, could take weeks of work, and it would not replicate to other repositories, who will have different spreadsheets. Whilst Excel is reasonably simple, and most offices have it, it is also worth bearing in mind that creating data in Excel has considerable shortcomings. It is not designed for hierarchical archival data, which has requirements in terms of both structure and narrative, and is constantly being revised. TNA’s Discovery are also working with Excel, so we may be able to collaborate with them in progressing this area of work.

Our new architecture is working well, and it is gratifying to see that what we envisaged when we started working with Knowledge Integration and started setting out our vision for our workflow is now a reality.  Nothing stands still in archives, in standards, in technology or in user requirements, so we cannot stand still either, but we have a set-up that enables us to be flexible, and modify our processing to meet any new challenges.

The Website for the New Archives Hub

screenshot of archives hub homepage
Archives Hub homepage

The back end of a new system usually involves a huge amount of work and this was very much the case for the Archives Hub, where we changed our whole workflow and approach to data processing (see The Building Blocks of the new Archives Hub), but it is the front end that people see and react to; the website is a reflection of the back end, as well as involving its own user experience challenges, and it reflects the reality of change to most of our users.

We worked closely with Knowledge Integration in the development of the system, and with Gooii in the design and implementation of the front end, and Sero ran some focus groups for us, testing out a series of wireframe designs on users. Our intention was to take full advantage of  the new data model and processing workflow in what we provided for our users. This post explains some of the priorities and design decisions that we made. Additional posts will cover some of the areas that we haven’t included here, such as the types of description (collections, themed collections, repositories) and our plan to introduce a proximity search and a browse.

Speed is of the Essence

Faster response times were absolutely essential and, to that end, a solution based on an enterprise search solution (in this case Elasticsearch) was the starting point. However, in addition to the underlying search technology, the design of the data model and indexing structure had a significant impact on system performance and response times, and this was key to the architecture that Knowledge Integration implemented. With the previous system there was only the concept of the ‘archive’ (EAD document) as a whole, which meant that the whole document structure was always delivered to the user whatever part of it they were actually interested in, creating a large overhead for both processing and bandwidth. In the new system, each EAD record is broken down into many separate sections which are each indexed separately, so that the specific section in which there is a search match can be delivered immediately to the user.

To illustrate this with an example:-

A researcher searches for content relating to ‘industrial revolution’ and this scores a hit on a single item 5 levels down in the archive hierarchy. With the previous system the whole archive in which the match occurs would be delivered to the user and then this specific section would be rendered from within the whole document, meaning that the result could not be shown until the whole archive has been loaded. If the results list included a number of very large archives the response time increased accordingly.

In the new system, the matching single item ‘component’ is delivered to the user immediately, when viewed in either the result list or on the detail page, as the ability to deliver the result is decoupled from archive size. In addition, for the detail page,  a summary of the structure of the archive is then built  around the item to provide both the context and allow easy navigation.

Even with the improvements to response times, the tree representation (which does have to present a summary of the whole structure), for some very large multi-level descriptions takes a while to render, but the description itself always loads instantly. This means that that the researcher can always see they have a result immediately and view it, and then the archival structure is delivered (after a short pause for very large archives) which gives the result context within the archive as a whole.

The system has been designed to allow for growth in both the number of contributors we can support and  the number of end-users, and will also improve our ability to syndicate the content to both Archives Portal Europe and deliver contributors own ‘micro sites‘.

Look and Feel

Some of the feedback that we received suggested that the old website design was welcoming, but didn’t feel professional or academic enough – maybe trying to be a bit too cuddly. We still wanted to make the site friendly and engaging, and I think we achieved this, but we also wanted to make it more professional looking, showing the Hub as an academic research tool.  It was also important to show that the Archives Hub is a Jisc service, so the design Gooii created was based upon the Jisc pattern library that we were required to use in order to fit in with other Jisc sites.

We have tried to maintain a friendly and informal tone along with use of cleaner lines and blocks, and a more visually up-to-date feel. We have a set of consistent icons, on/off buttons and use of show/hide, particularly with the filter. This helps to keep an uncluttered appearance whilst giving the user many options for navigation and filtering.

In response to feedback, we want to provide more help with navigating through the service, for those that would like some guidance. The homepage includes some ‘start exploring’ suggestions for topics, to help get inexperienced researchers started, and we are currently looking at the whole ‘researching‘ section and how we can improve that to work for all types of users.

Navigating

We wanted the Hub to work well with a fairly broad search that casts the net quite widely. This type of search is often carried out by a user who is less experienced in using archives, or is new to the Hub, and it can produce a rather overwhelming number of results. We have tried to facilitate the onward journey of the user through judicious use of filtering options. In many ways we felt that filtering was more important than advanced search in the website design, as our research has shown that people tend to drill down from a more general starting point rather than carry out a very specific search right from the off.  The filter panel is up-front, although it can be hidden/shown as desired, and it allows for drilling down by repository, subject, creator, date, level and digital content.

Another way that we have tried to help the end user is by using typeahead to suggest search results. When Gooii suggested this, we gave it some thought, as we were concerned that the user might think the suggestions were the ‘best’ matches, but typeahead suggestions are quite a common device on the web, and we felt that they might give some people a way in, from where they could easily navigate through further descriptions.

Hub website example of type ahead results
A search for ‘design’ with suggested results

 

The suggestions may help users to understand the sort of collections that are described on the Hub. We know that some users are not really aware of what ‘archives’ means in the context of a service like the Archives Hub, so this may help orientate them.

Suggested results also help to explain what the categories of results are – themes and locations are suggested as well as collection descriptions.

 

 

We thought about the usability of the hit list. In the feedback we received there was no clear preference for what users want in a hit list, and so we decided to implement a brief view, which just provides title and date, for maximum number of results, and also an expanded view, with location, name of creator, extent and language, so that the user can get a better idea of the materials being described just from scanning through the hit list.

An example of a hit list result in expanded mode
Expanded mode gives the user more information

With the above example, the title and date alone do not give much information, which is particularly common with descriptions of series or items, of so the name of creator adds real value to the result.

Seeing the Wood Through the Trees

The hierarchical nature of archives is always a challenge; a challenge for cataloguing,  processing and presentation. In terms of presentation, we were quite excited by the prospect of trying something a bit different with the new Hub design. This is where the ‘mini map’ came about. It was a very early suggestion by K-Int to have something that could help to orientate the user when they suddenly found themselves within a large hierarchical description. Gooii took the idea and created a number of wireframes to illustrate it for our focus groups.

For instance, if a user searches on Google for ‘conrad slater jodrell bank’ then they get a link to the Hub entry:

screenshot of google search result for a Hub description
Result of a search on Google

The user may never have used archives, or the Archives Hub before. But if they click on this link, taking them directly to material that sits within a hierarchical description, we wanted them to get an immediate context.

screen shot of one entry in the Jodrell Bank Archive
Jodrell Bank Observatory Archives: Conrad Slater Files

The page shows the description itself, the breadcrumb to the top level, the place in the tree where these particular files are described and a mini map that gives an instant indication of where this entry is in the whole. It is  intended (1) to give a basic message for those who are not familiar with archive collections – ‘there is lots more stuff in this collection’ and (2) to provide the user with a clearly understandable  expanding tree for navigation through this collection.

One of the decision we made, illustrated here, was to show where the material is held at every level, for every unit of description. The information is only actually included at the top level in the description itself, but we can easily cascade it down. This is a good illustration of where the approach to displaying archive descriptions needs to be appropriate for the Web – if a user comes straight into a series or item, you need to give context at that level and not just at the top level.

The design also works well for searches within large hierarchical descriptions.

screenshot showing a 'search within' with highlighted results
Search for ‘bicycles’ within the Co-operative Union Photographic Collection

The user can immediately get a sense of whether the search has thrown up substantial results or not. In the example above you can see that there are some references to ‘bicycles’ but only early on in the description.  In the example below, the search for ‘frost on sunday’ shows that there are many references within the Ronnie Barker Collection.

screenshot showing search within with lots of highlighted results
Search within the Ronnie Barker Collection for ‘frost on sunday’

One of the challenges for any archive interface is to ensure that it works for experienced users and first-time users. We hope that the way we have implemented navigation and searching mean that we have fulfilled this aim reasonably well.

Small is Beautiful

screenshot showing the Hub search on a mobile phone
The Archives Hub on an iPhone

The old site did not work well on mobile devices. It was created before mobile became massive, and it is quite hard to retrospectively fit a design to be responsive to different devices. Gooii started out with the intention of creating a responsive design, so that it renders well on different sized screens.  It requires quite a bit of compromise, because rendering complex multi-level hierarchies and very detailed catalogues on a very small screen is not at all easy. It may be best to change or remove some aspects of functionality in order to ensure the site makes sense. For example, the mobile display does not open the filter by default, as this would push the results down the page. But the user can open the filter and use the faceted search if they choose to do so.

We are particularly pleased that this has been achieved, as something like 30% of Hub use is on mobiles and tablets now, and the basic search and navigation needs to be effective.

graph showing use of desk, mobile and tablet devices on the Hub
Devices used to view the Hub site over a three month period

In the above graph, the orange line is desktop, the green is mobile and the purple is tablet. (the dip around the end of December is due to problems setting up the Analytics reporting).

Cutting Our Cloth

One of the lessons we have learnt over 15 years of working on the Archives Hub is that you can dream up all of the interface ideas that you like, but in the end what you can implement successfully comes down to the data. We had many suggestions from contributors and researchers about what we could implement, but oftentimes these ideas will not work in practice because of the variations in the descriptions.

We though about implementing a search for larger, medium sized or smaller collections, but you would need consistent ‘extent’ data, and we don’t have that because archivists don’t use any kind of controlled vocabulary for extent, so it is not something we can do.

When we were running focus groups, we talked about searching by level – collection, series, sub-series, file, item, etc. For some contributors a search by a specific level would be useful, but we could only implement three levels – collection (or ‘top level’), item (which includes ‘piece’) and then everything between these, because the ‘in-between’ levels don’t lend themselves to clear categorisation. The way levels work in archival description, and the way they are interpreted by repositories, means we had to take a practical view of what was achievable.

We still aren’t completely sold on how we indicate digital content, but there are particular challenges with this. Digital content can be images that are embedded within the description, links to images, or links to any other digital content imaginable. So, you can’t just use an image icon, because that does not represent text or audio. We ended up simply using a tick to indicate that there is digital content of some sort. However, one large collection may have links to only one or two digital items, so in that case the tick may raise false expectations. But you can hardly say ‘includes digital content, but not very much, so don’t get too excited’. There is  room for more thought about our whole approach to digital content on the Hub, as we get more links to digital surrogates and descriptions of born-digital collections.

Statistics

The outward indication of a more successful site is that use goes up. The use of statistics to give an indication of value is fraught with problems. Do the number of clicks represent value? Might more clicks indicate a poorer user interface design? Or might they indicate that users find the site more engaging? Does a user looking at only one description really gain less value than a user looking at ten descriptions? Clearly statistics can only ever be seen as one measure of value, and they need to be used with caution. However, the reality is that an upward graph is always welcomed! Therefore we are pleased to see that overall use of the website is up around 32% compared to this period during the previous year.

graph of blog stats comparing dataJan 2016 (the orange line) and Jan 2017 (the blue line), which shows typical daily use above 2,000 page views.

Feedback

We are pleased to say that the site has been very well received…

“The new site is wonderful. I am so impressed with its speed and functionality, as well as its clean, modern look.” (University Archivist)

“…there are so many other features that I could pick out, such as the ability to download XML and the direct link generator for components as well as collections, and the ‘start exploring’ feature.”  (University Archivist)

“Brand new Archives Hub looks great. Love how the ‘explorer themes’ connect physically separated collections” (Specialist Repository Head of Collections)

“A phenomenal achievement!” (Twitter follower)

 

With thanks to Rob Tice from Knowledge Integration for his input to this post.

Save

The Building Blocks of the New Archives Hub

This is the first post outlining what the Archives Hub team have been up to over the past 18 months in creating a new system. We have worked with Knowledge Integration (K-Int) to create a new back end, using their CIIM software and Elastic Search, and we’ve worked with Gooii and Sero to create  a new interface. We are also building a new EAD Editor for cataloguing. Underlying all this we have a new data workflow and we will be implementing this through a new administrative interface. This post summarises some of the building blocks – our overall approach, objectives and processes.

What did we want to achieve?

The Archives Hub started off as a pilot project and has been running continuously as a service aggregating UK archival descriptions since 1999 (officially launched in 2001). That’s a long time to build up experience, to try things out, to have successes and failures, and to learn from mistakes.

The new Hub aimed to learn lessons from the past and to build positively upon our experiences.

Our key goals were:

  • sustainability
  • extensibility
  • reusability

Within these there is an awful I could unpack. But to keep it brief…

It was essential to come up with a system that could be maintained with the resources we had. In fact, we aimed to create a system that could be maintained to a basic level (essentially the data processing) with less effort than before. This included enabling contributors to administer their own data through access to a new interface, rather than having to go through the Hub team. Our more automated approach to basic processing would give us more resource to concentrate on added value, and this is essential in order to keep the service going, because a service has to develop  to remain relevant and meet changing needs.

The system had to be ‘future proof’ to the extent that we could make it so. One way to achieve this is to have a system that can be altered and extended over time; to make sure it is reasonably modular so that elements can be changed and replaced.

Key for us was that we wanted to end up with a store of data that could potentially be used in other interfaces and services. This is a substantial leap from thinking in terms of just servicing your own interface. But it is essential in the global digital age, and when thinking about value and impact, to think beyond your own environment and think in terms of  opportunities for increasing the profile and use of archives and of connecting data. There can be a tension between this kind of objective of openness and the need to clearly demonstrate the impact of the service, as you are pushing data beyond the bounds of your own scope and control, but it is essential for archives to be ‘out there’ in the digital environment, and we cannot shy away from the challenges that this raises.

In pursuing these goals, we needed to bring our contributors along with us. Our aims were going to have implications for them, so it was important to explain what we were doing and why.

Data Model for Sustainability

It is essential to create the right foundation. At the heart of what we do is the data (essentially meaning the archive descriptions, although future posts will introduce other types of data, namely repository descriptions and ‘name authorities’). Data comes in, is processed, is stored and accessed, and it flows out to other systems. It is the data that provides the value, and we know from experience that the data itself provides the biggest challenges.

The Archives Hub system that we originally created, working with the University of Liverpool and Cheshire software, allowed us to develop a successful aggregator, and we are proud of the many things we achieved. Aggregation was new, and, indeed, data standards were relatively new, and the aim was essentially to bring in data and provide access to it via our Archives Hub website. The system was not designed with a focus on a consistent workflow and sustainability was something of an unknown quantity, although the use of Encoded Archival Description (EAD) for our archive collection descriptions gave us a good basis in structured data. But in recent years the Hub started to become out of step with the digital environment.

For the new Hub we wanted to think about a more flexible model. We wanted the potential to add new ‘entities’. These may be described as any real world thing, so they might include archive descriptions, people, organisations, places, subjects, languages, repositories and events. If you create a model that allows for representing different entities, you can start to think about different perspectives, different ways to access the data and to connect the data up. It gives the potential for many different contexts and narratives.

We didn’t have the time and resource to bring in all the entities that we might have wanted to include; but a model that is based upon entities and relationships leaves the door open to further development. We needed a system that was compatible with this way of thinking. In fact, we went live without the ‘People and Organisations’ entity that we have been working on, but we can implement it when we are ready because the system allows for this.

Archives Hub Entity Relationship diagram
Entities within the Archives Hub system

The company that we employed to build the system had to be able to meet the needs of this type of model. That made it likely that we would need a supplier who already had this type of system. We found that with Knowledge Integration, who understood our modelling and what we were trying to achieve, and who had undertaken similar work aggregating descriptions of museum content.

Data Standards

The Hub works with Encoded Archival Description, so descriptions have to be valid EAD, and they have to conform to ISAD(G) (which EAD does). Originally the Hub employed a data editor, so that all descriptions were manually checked. This has the advantage of supporting contributors in a very 1-2-1 way, and working on the content of descriptions as well as the standardisation (e.g. thinking about what it means to have a useful title as well as thinking about the markup and format) and it was probably essential when we set out. But this approach had two significant shortcomings – content was changed without liaising with the contributor, which creates version control issues, and manual checking inevitably led to a lack of consistency and non-repeatable processes. It was resource intensive and not rigorous enough.

In order to move away from this and towards machine based processing we embarked upon a long process, over several months, of discussing ‘Hub data requirements’. It sometimes led to brain-frying discussions, and required us to make difficult decisions about what we would make mandatory. We talked in depth about pretty much every element of a description; we talked about levels of importance – mandatory, recommended, desirable; we asked contributors their opinions; we looked at our data from so many different angles. It was one of the more difficult elements of the work.  Two brief examples of this (I could list many more!):

Name of Creator

Name of creator is an ISAD(G) mandatory field. It is important for an understanding of the context of an archive. We started off by thinking it should be mandatory and most contributors agreed. But when we looked at our current data, hundreds of descriptions did not include a name of creator. We thought about whether we could make it mandatory for a ‘fonds’ (as opposed to an artificial collection), but there can be instances where the evidence points to a collection with a shared provenance, but the creator is not known. We looked at all the instances of ‘unknown’ ‘several’, ‘various’, etc within the name of creator field. They did not fulfill the requirement either – the name of a creator is not ‘unknown’. We couldn’t go back to contributors and ask them to provide a creator name for so many descriptions. We knew that it was a bad idea to make it mandatory, but then not enforce it (we had already got into problems with an inconsistent approach to our data guidelines). We had to have a clear position. For me personally it was hard to let go of creator as mandatory! It didn’t feel right. It meant that we couldn’t enforce it with new data coming in. But it was the practical decision because if you say ‘this is mandatory except for the descriptions that don’t have it’ then the whole idea of a consistent and rigorous approach starts to be problematic.

Access Conditions

This is not an ISAD(G) mandatory field – a good example of where the standard lags behind the reality. For an online service, providing information about access is essential. We know that researchers value this information. If they are considering travelling to a repository, they need to be aware that the materials they want are available. So, we made this mandatory, but that meant we had to deal with something like 500 collections that did not include this information. However, one of the advantages of this type of information is that it is feasible to provide standard ‘boiler plate’ text, and this is what we offered to our contributors. It may mean some slightly unsatisfactory ‘catch all’ conditions of access, but overall we improved and updated the access information in many descriptions, and we will ask for it as mandatory with future data ingest.

 Normalizing the Data

Our rather ambitious goal was to improve the consistency of the data, by which I mean reducing variation, where appropriate, with things like date formats, name of repository, names of rules or source used for index terms, and also ensuring good practice with globally unique references.

To simplify somewhat, our old approach led us to deal with the variations in the data that we received in a somewhat ad hoc way, creating solutions to fix specific problems; solutions that were often implemented at the interface rather than within the back-end system. Over time this led to a somewhat messy level of complexity and a lack of coherence.

When you aggregate data from many sources, one of the most fundamental activities is to enable it to be brought together coherently for search and display so oftentimes you are carrying out some kind of processing to standardise in some way. This can be characterised as simple processing and complex processing:

1) If X then Y

2) If X then Y or Z depending on whether A is present, and whether B and C match or do not match and whether the contributor is E or F.

The first example is straightforward; the second can get very complicated.

If you make these decisions as you go along, then after so many years you can end up with a level of complexity that becomes rather like a mass of lengths of string that have been tangled up in the middle – you just about manage to ensure that the threads in and out are still showing (the data in at one end; the data presented through interface the researcher uses at the other) but the middle is impossible to untangle and becomes increasingly difficult to manage.

This is eventually going to create problems for three main reasons. Firstly, it becomes harder to introduce more clauses to fix various data issues without unforeseen impacts, secondly it is almost impossible to carry out repeatable processes, and thirdly (and really as a result of the other two), it becomes very difficult to provide the data as one reasonably coherent, interoperable set of data for the wider world.

We needed to go beyond the idea of the Archives Hub interface being the objective; we needed to open up the data, to ensure that contributors could get the maximum impact from providing the data to the Archives Hub. We needed to think of the Hub not as the end destination but as a means to enable many more (as yet maybe unknown) destinations. By doing this, we would also set things up for if and when we wanted to make significant changes to our own interface.

This is a game changer. It sounds like the right thing to do, but the problem is that it meant tackling the descriptions we already had on the Hub to introduce more consistency. Thousands of descriptions with hundreds of thousands of units created over time, in different systems, with different mindsets, different ‘standards’, different migration paths. This is a massive challenge, and it wasn’t possible for us to be too idealistic; we had to think about a practical approach to transforming descriptions and creating descriptions that makes them more re-usable and interoperable. Not perfect, but better.

Migrating the Data

Once we had our Hub requirements in place, we could start to think about the data we currently have, and how to make sure it met our requirements. We knew that we were going to implement ‘pipelines’ for incoming data (see below) within the new system, but that was not exactly the same process as migrating data from old world to new, as migration is a one-off process. We worked slowly and carefully through a spreadsheet, over the best part of a year, with a line for each contributor. We used XSLT transforms (essentially scripts to transform data). For each contributor we assessed the data and had to work out what sort of processing was needed. This was immensely time-consuming and sometimes involved complex logic and careful checking, as it is very easy with global edits to change one thing and find knock-on effects elsewhere that you don’t want.

The migration process was largely done through use of these scripts, but we had a substantial amount of manual editing to do, where automation simply couldn’t deal with the issues. For example:

  • dates such as 1800/190, 1900-20-04, 8173/1878
  • non-unique references, often the result of human error
  • corporate names with surnames included
  • personal names that were really family names
  • missing titles, dates or languages

 When working through manual edits, our aim was to liaise with the contributor, but in the end there was so much to do that we made decisions that we thought were sensible and reasonable. Being an archivist and having significant experience of cataloguing made me feel qualified to do this. With some contributors, we also knew that they were planning a re-submission of all their descriptions, so we just needed to get the current descriptions migrated temporarily, and a non-ideal edit might therefore be fine just for a short period of time. Even with this approach we ended have a very small number of descriptions that we could not migrate for the going live date because we needed more time to figure out how to get them up to the required standard.

 Creating Pipelines

Our approach to data normalization for incoming descriptions was to create ‘pipelines’. More about this in another blog post, but essentially, we knew that we had to implement repeatable transformation processes. We had data from many different contributors, with many variations. We needed a set of pipelines so that we could work with data from each individual contributor appropriately.. The pipelines include things like:

  • fix problems with web links (where the link has not been included, or the link text has not been included)
  • remove empty tags
  • add ISO language code
  • take archon codes out of names of repositories

Of course, for many contributors these processes will be the same – there would be a default approach, but we sometimes will need to vary the pipelines as appropriate for individual contributors. For example:

  • add access information where it is not present
  • use the ‘alternative reference’ (created in Calm) as the main reference

We will be implementing these pipelines in our new world, through the administration interface that K-Int have built. We’re just starting on that particular journey!

Conclusion

We were ambitious, and whilst I think we’ve managed to fulfill many of the goals that we had, we did have to modify our data standards to ‘lower the bar’ as we went along. It is far better to set data standards at the outset as changing them part way through usually has ramifications, but it is difficult to do this when you have not yet worked through all the data. In hindsight, maybe we should have interrogated the data we have much more to begin with, to really see the full extent of the variations and missing data…but maybe that would have put us off ever starting the project!

The data is key. If you are aggregating from many different sources, and you are dealing with multi-level descriptions that may be revised every month, every year, or over many years, then the data is the biggest challenge, not the technical set-up. It was essential to think about the data and the workflow first and foremost.

It was important to think about what the contributors can do – what is realistic for them. The Archives Hub contributors clearly see the benefits of contributing and are prepared to put what resources they can into it, but their resources are limited. You can’t set the bar too high, but you can nudge it up in certain ways if you give good reasons for doing so.

It is really useful to have a model that conveys the fundamentals of your data organisation. We didn’t apply the model to environment; we created the environment from the model. A model that can be extended over time helps to make sure the service remains relevant and meets new requirements.

 

12 days of Christmas – Archives Style! (2016 remix)

Archives Hub feature for December 2016

The Twelve Days of Christmas song poster
“The Twelve Days of Christmas song poster” by Xavier Romero-Frias is
licensed under CC BY-SA 3.0

There are several versions of the traditional folk melody The Twelve Days of Christmas. This feature is based on the 1909 publication by English composer Frederic Austin.

On the twelfth day of Christmas, my true love sent to me…

Twelve drummers drumming

‘The Little Drummer Boy’ greetings card, c. 1968-1999. An illustration of the well-known carol, the card is part of a collection of publications, prints and original artwork by the illustrators, twins Janet and Anne Grahame Johnstone. The Johnston Memorial Collection, 1951-1999, is held by Seven Stories, the Centre for Children’s Books.
http://archiveshub.ac.uk/data/gb1840-jaj/jaj/02/04/10

Logo for Seven Stories
Logo for Seven Stories, the Centre for Children’s Books

Sarwar Sabri Collection, 1985-2005. Sarwar Sabri (Sarvar Sabri) is an internationally renowned tabla player and composer. As a composer he has provided music for TV, radio and various dance theatre companies. The collection is held by Special Collections, Brunel University Library.
http://archiveshub.ac.uk/data/gb1975-ss

Eleven pipers piping

Dagenham Girl Pipers, 1937-2000. Founded in 1930 by Reverend Joseph Waddington Graves, they were the first female pipe band in the world. The Dagenham Girl Pipers toured the world, and in 1937 appeared in Berlin before Adolf Hitler, who told Mr Graves he wished Germany had a similar band. The Dagenham Girl Pipers Veterans’ Association was formed in 1998. The collection includes letters, newspaper cuttings, scrapbooks and photographs and is held by Barking and Dagenham Archive and Local Studies Centre.
http://archiveshub.ac.uk/data/gb350-bd7

Papers of John and Myfanwy Piper, 1882-1990s. John Piper (1903-1992) was a major figure in modern British art. He was a painter in oils and water colour, designed stained glass, ceramics and for the stage, made prints and devised ingenious firework displays. In addition to this he was also a gifted photographer of buildings and landscapes. Piper also wrote poetry, art criticism and several guidebooks on landscape and architecture. the collection is held by the Tate Gallery Archive.
http://archiveshub.ac.uk/data/gb70-tga200410

Ten lords a-leaping

Petitions from Nottinghamshire to Oliver Cromwell (1599-1658), Lord Protector, c.1658. The principal items in the collection are two original petitions to Oliver Cromwell from inhabitants of Nottinghamshire, dating from c. 1658. The first petition requests tougher control on profanity, libertinism and heresies, revision of the laws of the nation, and asks that during Cromwell’s lifetime provision for future government is secured. The second petition requests regulation of the ancient laws regarding the Sacrament of the Last Supper and has 15 signatories. The collection is held by the University of Nottingham.
http://archiveshub.ac.uk/data/gb159-ms215

Captain Stanley Lord, Master of the SS Californian, career papers, Titanic articles and other papers, 1891-1997. The collection contains documents dated between 1891 and 1997 and mainly concerns the campaign to clear Captain Stanley Lord (1877-1962) of the accusations levelled against him with regard to the sinking of the Titanic. It contains Captain Lord’s career papers, and some contemporary items from 1912. Held by National Museums Liverpool: Maritime Archives and Library.
http://archiveshub.ac.uk/data/gb136-d/lo

Lord David Owen, 1962-2006. David Owen was born in 1938 in Plymouth. He studied medicine at Cambridge University and became a Senior Neurology and Psychiatric Registrar but upon becoming Parliamentary Under-Secretary of State for Defence for the Royal Navy in 1968, resigned his hospital work in favour of politics. He later served as Foreign Secretary until the defeat of the Labour Party in the 1979 General Election and in 1982 became Deputy Leader of the new Social Democrat party. The collection comprises personal papers, papers relating to the Labour Party, SDP papers, papers collected from work with independent organisations and Lord Owen’s Office. Held by Liverpool University, Special Collections and Archives.
http://archiveshub.ac.uk/data/gb141-d709

Nine ladies dancing

Photograph of ballet dancer, Anthony Crickmay Dance Photographs, © V&amp;A Department of Theatre and Performance.
Anthony Crickmay Dance Photographs (THM/20), © V&A Department of Theatre and Performance, Victoria and Albert Museum, London.

Papers of Diana Gould, 1926-1996. Diana Rosamund Constance Grace Irene Gould was a British ballerina. Early in her career Sergei Diaghilev spotted her and invited her to join his Ballets Russes but he died before this could be arranged, events said to have been
fictionalized in the film ‘The Red Shoes’. Diana married Sir Yehudi Menuhin in 1947. the collection is held by the Rambert Dance Company Archives.
http://archiveshub.ac.uk/data/gb2228-dpdg

Dorothy Madden Collection, 1912-2002. Dr Dorothy Gifford Madden, former Professor Emerita of the University of Maryland, United States of America who was responsible for bringing American modern dance practice to the United Kingdom. Held by Trinity Laban Conservatoire of Music and Dance (Laban Archive).
http://archiveshub.ac.uk/data/gb1701-d23

Collection of material relating to Anna Pavlova, 1875-1965. Anna Pavlova (1881-1931) was the most celebrated ballerina of her generation. The collection includes accessories originally worn by Pavlova in performance, scrapbooks containing many assorted press and illustrated magazine cuttings featuring Pavlova and sepia prints of Pavlova at a young age. Collection held by The Royal Ballet School, White Lodge Museum.
http://archiveshub.ac.uk/data/gb3208-rbs/pav

Eight maids a-milking

Logo: University of Leeds
Logo: University of Leeds (Leeds University Library Special Collections)

M. Russell-Fergusson papers, 1914-1990. M. Russell-Fergusson, Women’s National Land Service Corps, served as a milk maid in Norfolk from Aug. 1917 and later in Leicestershire and at the Royal Dairy Farm, Windsor. Held by Leeds University Library.
http://archiveshub.ac.uk/data/gb206-liddlecollectiondf112

Programme for The Foresters, Robin Hood and Maid Marian, 1892. Forms part of The Ellen Terry Collection, materials relating to the Lyceum Theatre series. Actress Ellen Terry (1847-1928) made her stage debut in 1856 as Mamillius in The Winter’s Tale. In 1878 was invited to join Henry Irving’s company at the Lyceum Theatre as its leading lady. Ellen Terry and Henry Irving were soon regarded as the leading Shakespearean actors in Great Britain and they achieved huge success in both Shakespeare and non-Shakespeare plays. In 1888 she gained excellent reviews for her portrayal of Lady Macbeth in Macbeth. The Lyceum Company toured extensively in both the UK and America to capacity audiences. Held by the V and A Department of Theatre and Performance.
Programme description: http://archiveshub.ac.uk/data/gb71-thm/384/thm/384/44/3
Collection description: http://archiveshub.ac.uk/data/gb71-thm/384

Express Dairies, 1904-1974. The Express Country Milk Supply Company was established in London in 1864 by George Barham. It became the Express Dairy Company Limited in 1892. Milk was transported into London by rail, and delivered to homes. The Dairy Supply Company was formed as a separate company selling dairy equipment such as the milk churn which was invented by Barham. The company grew, purchasing College Farm, Finchley, London to conduct dairy experiments. The farm was sold in 1983. The firm also ran Express teashops, cafes and bakery and became a limited company in 1937. In 1969 Express became part of Grand Metropolitan and in 1992 part of Northern Foods. In 1998 the name of Express Dairies Plc returned, with the division of Northern Foods into two sections. Collection held by the University of Reading, Museum of English Rural Life.
http://archiveshub.ac.uk/data/gb7-trexp

Seven swans a-swimming

Harold Thomas Swan Papers, 1945-1996. Papers on the history of the clinical use of penicillin, 1945-1996, with particular reference to its early use in Sheffield, and to the reputation of Sir Alexander Fleming. Assembled by Dr Harold T. SwanMD, FRCP, FRCPath, Honorary Lecturer in Medical History, University of Sheffield, and formerly Consultant in Haematology, United Sheffield Hospitals. Held by the University of Sheffield Library.
http://archiveshub.ac.uk/data/gb200-ms185

Archives of Swan Sonnenschein and Co, 1878-1916. William Swan Sonnenschein (1855-1934) was apprenticed to the firm of Williams and Norgate where he gained experience of second hand bookselling before founding his own company, W. Swan Sonnenschein and Allen, with the first of several partners, J. Archibald Allen, in 1878. This partnership was dissolved in 1882 when William married and the firm’s name changed to W Swan Sonnenschein and Co. The firm published general literature and periodicals but specialised in sociology and politics. Sonnenschein was involved with the Ethical Society and published their literature. In 1895 Swan Sonnenschein became a limited liability company and in 1902 William Swan Sonnenschein left to work at George Routledge and Sons and later at Kegan Paul. Swan Sonnenschein was amalgamated with George Allen and Co in 1911. The collection is held by Reading University: Special Collections Services.
http://archiveshub.ac.uk/data/gb6-rulmss3280,3282,4058

Six geese a-laying

Cuttings about Mother Goose pantomime, 1951. These records form part of the Unity Theatre, theatre company collection held by V&A Department of Theatre and Performance. Unity Theatre was founded in 1936 by a general meeting of the Rebel Players and Red Radio, left-wing theatre groups derived from the Workers’ Theatre Movement.
http://archiveshub.ac.uk/data/gb71-thm/9/thm/9/4/5/77

Gwynydd Gosling collection, 1990. Gwynydd Gosling is a private collector of Russian books and objets d’art. The collection comprises photographs of two tankard lids commemorating the Arrow Boat Club four-oared race, St Petersburg, 1870 (R. Butts, E. Gibson, W. E. Hubbard, A. W. Raitt, B. Wilding). Held by Leeds University Library.
http://archiveshub.ac.uk/data/gb206-ms1095

Barclays Group Archives logo
Logo: Barclays Group Archives

Goslings and Sharpe: private bankers, Fleet Street (London): branch records including customer ledgers, 1717-1972. One of the oldest City banks, the partnership originated c1650 with Henry Pinckney, a goldsmith banker trading from the sign of the three squirrels in Fleet Street, London. The firm was led subsequently by the Chambers family. In 1794 Benjamin Sharpe became a partner and from that date the customary name of the business was Goslings and Sharpe, the Sharpes remaining as junior partners with no right to nominate their successors. In 1742 Sir Francis Gosling joined the firm and thereafter the Goslings name predominated in the partnership. The Goslings’ original trade was that of stationers. Although most accounts are for individuals or family trusts, there are also non-personal accounts such as those of charities (including some schools and hospitals), public subscriptions (including relief of soldiers and of victims of natural disasters), colleges, businesses, and a few public corporations and parishes. Collection held by Barclays Group Archives (BGA).
http://archiveshub.ac.uk/data/gb2044-cfleetstreet19(goslings)

Five gold rings

The Golden Ring: a new and original fairy spectacular opera. by G[eorge] R. Sims with music by Frederic Clay. Stated as performed at “Alhambra Theatre, William Holland, Manager, 1883”. Part of the The George R. Sims Collection, 1858-1976. George Robert Sims (1847-1922) was an author, playwright, journalist and philanthropist. Collection held by The University of Manchester, The John Rylands University Library.
Volume description: http://archiveshub.ac.uk/data/gb133-grs/grs/2/11
Collection description: http://archiveshub.ac.uk/data/gb133-grs

National Union of Gold, Silver and Allied Trades, 1921-1985. The National Union of Gold, Silver and Allied Trades was formed in 1914 by the amalgamation of the Amalgamated Society of Gold, Silver and Kindred Trades and the Birmingham Silversmiths and Electroplate Operatives’ Society. In 1969 it absorbed the Society of Goldsmiths, Jewellers and Kindred Trades. In 1981 it became part of the Amalgamated Union of Engineering Workers (Technical, Administrative and Supervisory Section). Held by Modern Records Centre, University of Warwick.
http://archiveshub.ac.uk/data/gb152-mss.101/st

The rings may in fact refer to ringed-necked pheasants:

Glasgow School of Art logo
Logo: Glasgow School of Art

Pictorial tapestry rug featuring a pheasant, 1888.
Tapestry rug of worsted yarn and jute in acid colours featuring a pheasant in a floral landscape. Part of the Stoddard-Templeton Carpet and Textile Collection (c. 1840s-1960s). James Templeton and Co. was established in 1843, making Chenille, Axminster, Wilton and Brussels carpets. It employed artists of international calibre such as Charles Voysey, Walter Crane and Frank Brangwyn, with their carpets used in Coronations and in liners such as the Titanic. The collection is held by The Glasgow School of Art Archives and Collections Centre.
https://archiveshub.jisc.ac.uk/data/gb1694-dc077/dc077/2

Four calling birds

This could be song birds, such as Canaries, or may be ‘colly’ or black birds:

Descriptions of the Canary Islands and of the Azores, c. 1610.

Image: Transport for London Metropolitan Line
Image: TfL Metropolitan Line, Transport for London Corporate Archives.

The manuscript consists of two works, bound together. The first is a description of the Canary Islands, detailing the history, religion and laws of the natives, called the Guanches, as well as observations on the geography and fauna of the islands. The second work is a compilation from other works describing the Azores.The existence of the Canary Islands, a chain of seven islands off the northwest coast of Africa, was known to the Romans and later the Arabs, and European navigators reached the islands in the 13th century. The Azores, an archipelago in the Mid-Atlantic, were discovered in 1427 by the Portuguese and their colonisation by them began in 1432. The collection is held by  The University of Manchester, The John Rylands University Library.
http://archiveshub.ac.uk/data/gb133-engms17

Briefing on Canary Wharf Station, 1989.
Paper concerning delays and changes in the redesign of Canary Wharf Station. Subjects include construction and negotiations, unresolved issues and financial risk. Part of a series of minutes of meetings belonging to the Transport for London Group Archive.
http://archiveshub.ac.uk/data/gb2856-%28new%29lt000099/%28new%29lt000099/035

Production contracts for ‘Study from ‘Blackbird”, 2002. Part of the Rambert Dance Company Archive: Productions collection (1920s – 2010s), the folder includes choreographer contracts, production budget and correspondence concerning casting travel and rehearsals.
http://archiveshub.ac.uk/data/gb2228-rdc/pd/rdc/pd/06/01/0423

Three French hens

Michael French Collection, 1887-2006. Photographs and documents inherited and collected by Michael French relating to the French family of millers and their mills. Collection held by the Mills Archive Trust.
http://archiveshub.ac.uk/data/gb3132-fren

The Mills Archive Trust logo
Logo: The Mills Archive Trust

Richard Hughes, Ty Hen Isaf Manuscripts, 1693 – 1910. Richard Hughes of Ty Hen Isaf, Llannerch-y-medd, Anglesey was born in 1837 and died in 1930. As a young boy, he worked on Dyffryn Gwyn farm for the Rev. John Prytherch, who was one of the largest farmers in Anglesey. He also served as husbandman for two spinsters, who unexpectedly left him all their property. This enabled Richard Hughes to satisfy his two ambitions, to travel and to own a library. Then began a series of visits to Palestine and the Mediterranean. He became a great collector of rare and precious books and a friendship sprang between him and Thomas Shankland, the Welsh librarian of the University College of North Wales. Held by Bangor University.
http://archiveshub.ac.uk/data/gb222-bmssrh

Two turtle doves

Ms transcript of song, ‘The Turtle Dove’. 2 leaves belonging to a series of ms and ts transcripts of songs and ballads (1925 to 1965) by the poet and author Robert Graves (1895-1985). The papers are held at St John’s College, Oxford.
Item description: http://archiveshub.ac.uk/data/gb473-rg/m/rg/m/ballads/4
Collection description: http://archiveshub.ac.uk/data/gb473-rg

Records for the Dove Brothers Ltd, builders, 1850-1970.
Dove Brothers Ltd was a prominent construction company based in Islington from 1781 to 1993 which worked with most of the major architects of the late 19th to 20th century. The company was founded by William Spencer Dove (1793-1869). His sons formed the Dove Brothers partnership in 1852. The collection is held by Islington Local History Centre.
http://archiveshub.ac.uk/data/gb1032-s/dov

Reader’s Digest presents Christmas Stories for the entire family, Dove Audio, 1995. Featuring Paul Scofield reading ‘A Christmas Carol’ by Charles Dickens. This forms part of the Paul Schofield Collection, 1807 – 2010. Paul Scofield (1922-2008)  started his stage career in the 1940s and his name soon became synonymous with Classical theatre. Later in his career Scofield worked closely with the Royal Shakespeare Company for a number of years as well as The National Theatre, his roles were numerous and diverse. Beyond the theatre Scofield won acclaim through a number of films including ‘A Man For All Seasons'(1966) and ‘Expresso Bongo'(1958), as well as copious amounts of audiobooks and plays for BBC radio. Collection held by: V&A Department of Theatre and Performance.
Item description: http://archiveshub.ac.uk/data/gb71-thm/397/thm/397/5/2/27
Collection description: http://archiveshub.ac.uk/data/gb71-thm/397

And a partridge in a pear tree!

David Cassidy Collection, 1972-1976. The Amercian singer David Cassidy was best known for the musical sitcom The Partridge Family. The collection, created by  fan Kay Chesterman, consists of cuttings, publications and memorabilia relating to David Cassidy and members of his fan club. Held by the V&A Department of Theatre and Performance.
http://archiveshub.ac.uk/data/gb71-thm/378

Image of title page from "The 12 days of Christmas", 1780
Title page from the first known publication of “The 12 days of Christmas” in 1780.
Image in the public domain.

Bernard Partridge Drawings Collection, 1861-1905. Bernard Partridge (1861-1945) was a painter and illustrator who became the principal cartoonist of Punch magazine. This collection includes drawings of actor-manager Henry Irving (1838-1905) in some of his most famous roles, including Shylock, Hamlet, Mephistopheles, Dubosc and Lear. Collection held by the V&A Department of Theatre and Performance.
http://archiveshub.ac.uk/data/gb71-thm/227

Artworks by James Joshua Guthrie and relating to the Pear Tree Press, 1897-1930s. Designs and illustrations, along with other book illustration work and bookplates for the Pear Tree Press. Forms part of the British Library: Western Manuscripts‘ collection The Gordon Bottomley Papers, 1773, 1831-1958.  Consisting of correspondence, diaries, literary materials, artwork, photographs, and printed ephemera by, relating to, or collected by poet and playwright Gordon Bottomley (1874-1948).
Folder description: http://archiveshub.ac.uk/data/gb58-addms88957/addms88957/4/4
Collection description: http://archiveshub.ac.uk/data/gb58-addms88957

Trustees of W S Brown – proposed purchase of Deep Mines under Pear Tree House, Tyldesley. 1905. 2 items of correspondence, maintained by the trustees of the Bridgewater estate Ltd. Forms part of the Bridgewater Estates Archive, 1895-1960s, held by the University of Salford.
Item description: http://archiveshub.ac.uk/data/gb427-bea/bea/i/1774
Collection description: http://archiveshub.ac.uk/data/gb427-bea

Related information

Birds of the Twelve Days of Christmas, 10,000 Birds blog post, 2015: http://10000birds.com/birds-of-the-twelve-days-of-christmas.htm

The Twelve Days of Christmas – archives style! Archives Hub feature for December 2014

Exploring British Design: Interface Design Principles

Britain Can Make It exhibition poster
Britain Can Make It, exhibition poster

For our AHRC project, ‘Exploring British Design‘ one of the questions we asked is:

How might a website co-designed by researchers, rather than a top-down collection-defined approach to archive content, enhance engagement with and understanding of British design?

The workshops that we have run were one of the key ways that we hoped to understand more about how postgraduates and others research their topics, what they liked and didn’t like about websites, and in a general sense how they think and understand resources, and how we can tune into that thinking.

 

 

In the blogs posts that we have created so far, we set out one of our central ideas:

Providing different routes into archives, showing different contexts, and enabling researchers to create their own narratives, can potentially be achieved through a focus on the ‘real things’ within an archive description; the people, organisations and places, and also the events surrounding them.

The feedback from the workshops gave us plenty to work with, and here I wanted to draw out some of the key messages that we are using to help us design an interface.

Researchers often think visually

Several of the participants in our workshops were visual thinkers. Maybe we had a slightly biased group, in that they work within or study design, but it seems reasonable to conclude that a visual approach can be attractive and engaging. We want to find a way to represent information more visually, whilst providing a rich and detailed resource. Our belief is that the visual should not dominate or hide the textual, as does often happen with cultural heritage resources, but that they should work better together.

Researchers often think in terms of creating a story or narrative

When we asked our participants to focus on an individual object, several of them thought in terms of its ‘story’. It seemed to me that most of the discussions that we had assumed a narrative type approach. It is hardy surprising, as when we talk about people, places and events we connect them together. It is a natural thing to do.

Different types of contexts provide value

When we asked workshop participants to think about how they would go about researching the object they were given, they tended to think of ways to contextualise it. They were interested in where it came from, in its physicality and its story. For example, we gave out photographs of an exhibition and they wanted to know where the photographs were taken, more about the exhibition and the designers involved in it, what else was going on at that time?   Our idea with Exploring British Design is that we can create records that allow these kinds of contexts to flourish. The participants did not concentrate on traditional archival context, as they did not tend to recognise this in the same way as archivists – it is one perspective amongst many.

We cannot provide a substitute for the value of handling the original object, and it was clear that researchers found this to be immensely valuable, but we can help to provide context that helps to scope reality.

Uncovering the obscure is a good thing

Not surprisingly, our workshop participants were keen that their research efforts should result in finding little-known information that they could utilise. They talked about the excitement of uncovering information and the benefits for their work.

Habits are part of the approach to research

The balance between being innovative and anchoring an interface in what people are familiar with seems to be important.

Trust is very important

The importance of trust was stressed at all of our workshops, and the need to know the context of information. We need to build something that researchers believe is a quality resource, with information they can rely on.

Serendipity is good…although it can lead you astray

It was clear that our participants wanted to explore, and liked the idea of coming across the unexpected. Several of them felt that the library bookshelves provide a good opportunity to browse and discover new sources (they talked about this more than the serendipity of the web). But there was also a note of caution about time wasted pursuing different avenues of information. It seems good to build in serendipity, whilst providing an interface that gives clear landmarks and signposts.

Search and Relevance

Our workshop participants were clear that choice of search terms has a big influence on what you find, and this can be a disadvantage. You may be presented with a search box, and you don’t really know what to search for to get what you want, especially if you don’t know what you want! Also, the relevance ranking can be a puzzle. Library databases often seem to give results that don’t make that much sense.

One thing that stood out to me was the willingness to use Google, which is a simple search box, with no indication of how to search, that brings back huge amounts of results; but the criticisms of library databases, where choice of search term is crucial and where ‘too many results’ are seen as a problem. It seemed that the key here was effective relevance ranking, but our workshop participants did agree that relevance ranking can deceive: the first page of results may look good, but you don’t really know what you are missing. Google is good at providing a first page of useful looking results….and maybe that’s enough to stop most people wondering about what they might be missing!

 Exploring British Design

As our project has progressed, I think it is fair to say that we have benefitted hugely from the input of the students and academics that we have talked to, not only for this project but also more generally. But it was not possible for us to manage to implement a directly co-designed website. The logistics of the project didn’t allow for this, as we wanted to gather input to inform the project, and then we had the complications of pulling together the data, designing the back end and the API. We would probably have needed at least another 6 months on the project to go back to the workshop participants and ask them about the website design as we went along.

But I think we have achieved a good deal in terms of engagement. Our Exploring British Design project has been about other ways through content, moving away from a search box and a list of search results, and thinking about immersing researchers in a ‘landscape’, where they can orientate themselves but also explore freely. So, we are thinking about engagement in terms of a more visually attractive and immersive experience, giving researchers the opportunity to follow connections in a way that gives them a sense of movement through the design landscape, hints at the unknown, and shows the relevancy of the entities that are featured in the website.  We hope to show how this can potentially expand understanding because it allow for a wider context and more varied narratives.

In the next project post we hope to present our interface for this pilot project!

 

Europeana Tech 2015: focus on the journey

Last week I attended a very full and lively Europeana Tech conference. Here are some of the main initiatives and ideas I have taken away with me:

Think in terms of improvement, not perfection

Do the best you can with what you have; incorrect data may not be as bad as we think and maybe users expectations are changing, and they are increasingly willing to work with incomplete or imperfect data. Some of the speakers talked about successful crowd-sourcing – people are often happy to correct your metadata for you and a well thought-out crowd-sourcing project can give great results.

BL Georeferencer, showing an old map overlaying part of Manchester: http://www.bl.uk/maps/georeferencingmap.html
BL Georeferencer, showing an old map overlaying part of Manchester: http://www.bl.uk/maps/georeferencingmap.html

The British Library currently have an initiative to encourage tagging of their images on Flickr Commons and they also have a crowd-sourcing geo-referencer project.

The Cooper Hewitt Museum site takes a different and more informal approach to what we might usually expect from a cultural heritage site. The homepage goes for an honest approach:

“This is a kind of living document, meaning that development is ongoing — object research is being added, bugs are being fixed, and erroneous terms are being revised. In spite of the eccentricities of raw data, you can begin exploring the collection and discovering unexpected connections among objects and designers.”

The ‘here is some stuff’ and ‘show me more stuff’ type of approach was noticeable throughout the conference, with different speakers talking about their own websites. Seb Chan from the Cooper Hewitt Museum talked about the importance of putting information out there, even if you have very little, it is better than nothing (e.g. https://collection.cooperhewitt.org/objects/18446665).

The speaker from Google, Chris Welty, is best known for his work on ontologies in the Semantic Web and IBM’s Watson. He spoke about cognitive computing, and his message was ‘maybe it’s OK to be wrong’. Something may well still useful, even if it is not perfectly precise. We are increasingly understanding that the Web is in a state of continuous improvement, and so we should focus on improvement, not perfection. What we want is for mistakes to decrease, and for new functionality not to break old functionality.  Chris talked about the importance of having a metric – something that is believable – that you can use to measure improvement. He also spoke about what is ‘true’ and the need for a ‘ground truth’ in an environment where problems often don’t have a right or wrong answer. What is the truth about an image? If you show an image to a human and ask them to talk about it they could talk for a long time. What are the right things to say about it? What should a machine see? To know this, or to know it better, Chris said, Google needs data – more and more and more data. He made it clear that the data is key and it will help us on the road to continuous improvement. He used the example of searching for pictures of flowers using Google to find ‘paintings with flowers’. If you did this search 5 years ago you probably wouldn’t get just paintings with flowers. The  search has improved, and it will continue to improve.  A search for ‘paintings with tulips’ now is likely to show you just tulips. However, he gave the example of  ‘paintings with flowers by french artists’ –  a search where you start to see errors as the results are not all by french artists. A current problem Google are dealing with is mixed language queries, such as  ‘paintings des fleurs’, which opens a whole can of worms. But Chris’ message was that metadata matters: it is the metadata that makes this kind of searching possible.

The Success of Failure

Related to the point about improvement, the message is that being ‘wrong’ or ‘failing’ should be seen in a much more positive light. Chris Welty told us that two thirds of his work doesn’t make it into a live environment, and he has no problem with that. Of course, it’s hard not to think that Google can afford to fail rather more than many of us! But I did have an interesting conversation with colleagues, via Twitter, around the importance of senior management and funders understanding that we can learn a great deal from what is perceived as failure, and we shouldn’t feel compelled to hide it away.

Photo from Europeana Tech
Europeana Tech panel session, with four continents represented

Think in terms of Entities

We had a small group conversation where this came up, and a colleague said to me ‘but surely that’s obvious’. But as archivists we have always been very centered on documents rather than things – on the archive collection, and the archive collection description. The  trend that I was seeing reflected at Europeana Tech continued to be towards connections, narratives, pathways, utilising new tools for working with data, for improving data quality and linking data, for adding geo-coordinates and describing new entities, for making images more interoperable and contextualising information. The principle underlying this was that we should start from the real world – the real world entities – and go from there. Various data models were explored, such as the Europeana Data Model and CIDOC CRM, and speakers explained how entities can connect, and enable a richer landscape. Data models are a tricky one because they can help to focus on key entities and relationships, but they can be very complex and rather off-putting. The EDM seems to split the crowd somewhat, and there was some criticism that it is not event-based like CIDOC CRM, but the CRM is often criticised for being very complex and difficult to understand. Anyway, setting that aside, the overall the message was that relationships are key, however we decide to model them.

Cataloguing will never capture everyone’s research interests

An obvious point, but I thought it was quite well conveyed in the conference. Do we catalogue with the assumption that people know what they need? What about researchers interested in how ‘sad’ is expressed throughout history, or fashions for facial hair, or a million other topics that simply don’t fit in with the sorts of keywords and subject terms we normally use. We’ll never be able to meet these needs, but putting out as much data as we can, and making it open, allows others to explore, tag and annotate and create infinite groups of resources. It can be amazing and moving, what people create: Every3Minutes.

There’s so much out there to explore….

There are so many great looking tools and initiatives worth looking at, so many places to go and experiment with open data, so many APIs enabling so much potential. I ended up with a very long list of interesting looking sites to check out. But I couldn’t help feeling that so few of us have the time or resource to actually take advantage of this busy world of technology. We heard about Europeana Labs, which has around 100 ‘hardcore’ users and 2,200 registered keys (required for API use). It is described as “a playground for remixing and using your cultural and scientific heritage. A place for inspiration, innovation and sharing.” I wondered if we would ever have the time to go and have a play. But then maybe we should shift focus away from not being able to do these things ourselves, and simply allow others to use the data, and to adopt the tools and techniques that are available – people can create all sorts of things. One example amongst many we heard about at the conference is a cultural collage: zenlan.com/collage. It comes back to what is now quite an old adage, ‘the best innovation may not be done by you’. APIs enable others to innovate, and what interests people can be a real surprise. Bill Thompson from the BBC referred to a huge interest in old listings from Radio Times, which are now available online.

The International Image Interoperability Framework

I list the IIIF this because it jumped out at me as a framework that seems to be very popular – several speakers referred to it, and it very positive terms. I hadn’t heard of it before, but it seemed to be seen as a practical means to ensure that images are interoperable, and can be moved around different systems.

Think Little

One of my favourite thoughts from the conference, from the ever-inspirational Tim Sherratt, was that big ideas should enable little ideas. The little ideas are often what really makes the world go round. You don’t have to always think big. In fact, many sites have suffered from the tendency to try to do everything. Just because you can add tons of features to your applications, it doesn’t mean you should

The Importance of Orientation

How would you present your collections if you didn’t have a search box? This is the question I asked myself after listening to George Oates, from Good Form and Spectacle. She is a User Interface expert, and has worked on Flickr and for the Internet Archive amongst other things. I thought her argument about the need to help orientate users was interesting, as so often we are told that the ‘Google search box’ is the key thing, and what users expect. She talked about some of her experiments with front end interfaces that allow users to look at things differently, such as the V&A Spelunker. She spoke in terms of landmarks and paths that users could follow. I wonder if this is easier said than done with archives without over-curating what you have or excluding material that is less well catalogued, or does not have a nice image to work with. But I certainly think it is an idea worth exploring.

View of V&A Speleunker
“The V&A Spelunker is a rough thing built by Good, Form & Spectacle to give a different view into the collection of the Victoria & Albert Museum”

Exploring British Design: Research Paths II

We recently ran a second workshop as part of our Exploring British Design project. The workshops aim  to understand more about  approaches to research, and researchers’ understanding and use of archives.

The second workshop was run largely on the same basis as the first workshop, using the same exercises.

Looking at what our researchers said and documented about their research paths over the two workshops, some points came out quite strongly:

  • Google is by far the most common starting point but its shortcomings are clear and issue of trust come up frequently.
  • There is often a strong visual emphasis to research, including searching for images and the use of Pinterest; there seems to be a split between those who gravitate towards a more text-based approach and those who think visually (many of our participants were graphic designers though!).
  • It is common to utilise the references listed in Wikipedia articles.
  • The library as a source is seen as part of a diverse landscape – it is one place to go to, albeit an important one. It is not the first port of call for the majority.
  • Aggregators are not specifically referred to very often. But they may be seen as a place to go if other searches don’t yield useful results.
  • Talking to people is very important, be it lecturers, experts, colleagues or friends
  • Online research is more immediate, and usually takes less effort, but there are issues of trust and it may not yield specific enough results, or uncover the more obscure sources.
  • There is a tendency to start from the general and work towards the more specific. With the research paths of most of the researchers, the library/archive was somewhere in the middle of this process.
  • Personal habits and past experience play a very large part, but there is a real interest in finding new routes through research, so habit is not a sticking point, but simply the dominant influence unless it is challenged.

For the second workshop, the first exercise asked participants to document their likely research paths around a topic.

flip chart showing research paths for a topic
Research paths of two researchers for the topic of Simpsons of Piccadilly

 

We had four pairs of researchers looking at different topics, and we left them to discuss their research paths for about 45 minutes. The discussions following the exercise picked up on a number of areas:

Online vs Offline

We kicked off by asking the researchers about online versus ‘offline’ research paths. One participant commented that she saw online as a route through to traditional research – maybe to locate a library or archive – ‘online is telling me where to look’ but in itself it is too general and not specific enough; whereas the person she was paired with tended to do more research online. He saw online as giving the benefit of immediacy – at any time of day or night he could access content. The issue of trust came up in the discussion around this issue, and one participant summed up nicely: “If you do online research there is less effort but there is less trust; if you research offline there is more effort but there is more trust.”

Following on from the discussion about how people go about using online services, there was a comment that things found online are often the more obvious, the more used and cited resources. Visiting a library or archive may give more opportunity to uncover little known sources that help with original research. This seemed to be endorsed by most participants, one commenting that Pinterest tends to reflect what is trendy and popular. However, there was also a view that something like Pinterest can lead researchers to new sources, as they are benefiting from the efforts, and sometimes the quite obsessive enthusiasms, of a wide range of people.

There was agreement that online research can lead to ‘information dumping’, where you build up a formidable collection of resources, but are unlikely to get round to sorting them all out and using them.

Library Resources

The issue of effort came up later in the discussion when referring to a particular university library (probably typical of many university libraries), and the amount of effort involved in using its databases. There was a comment about how you need to ‘work yourself up to an afternoon in the library’ and there seemed to be a general agreement that the ‘search across all resources’ often produced quite meaningless results. When compared to Google, the issue seems to be that relevance ranking is not effective, so the top results often don’t match your requirements. There was also some discussion around the way that library resource discovery services often involve too many steps, and there is effort in understanding how the catalogue works. One participant, whose research centres on the Web and the online user experience, felt that printed sources were of little use to him, as they were out of date very quickly.

Curating your sources

One researcher talked about using Pinterest to organise findings visually. This was followed up by another researcher talking about how with online research you can organise and collect things yourself. It facilitates ‘curating’ your own collection of resources. It can also be easier to remember resources if they are visual. Comparing Pinterest to the Library – with the former you click to add the image to your board; with the Library you pay a visit, you find the book, you take it to the scanner, you pay to take a scan…although it is increasingly possible to take pictures of books using your own device. But the general feeling was that the Web was far quicker and more immediate.

Attitudes towards research

One participant felt that there might be a split between those more like him who see research as ‘a means to an end’ and those who enjoy the process itself. So maybe some are looking for the shortest route to the end goal, and others see research as more exploratory activity and expect it to take time and effort. This may partly be a result of the nature and scope of the research. Short time scales preclude in-depth research.

Talking about serendipitous approaches, someone commented that browsing the library shelves can be constructive, as you can find books around your subject that you weren’t aware existed. This is replicated to some extent in something like Amazon, which suggests books you might be interested in. There was also some feeling that exploring too many avenues can take the researcher off topic and take up a great deal of time.

Trust and Citation

The issue of trust is important.  A first-hand experience, whether of a place you are researching, or using physical archive sources, is the most trustworthy, because you are seeing with your own eyes, experiencing first hand or looking at primary sources first hand; a library provides the next level of trust, as a book is an interpretation, and you may feel it requires corroboration; the online world is the least trustworthy. You will have the least trust if you are looking at a website where you don’t know about who or what is behind it. There was agreement that trust can come through crowd sourced information, but also some discussion around how to cite this (for example, using the Harvard system to reference web pages and crowd sourced resources). This led on to a short discussion around the credibility of what is cited within research. Maybe attitudes to Wikipedia are slowly changing, but at present there is generally still a feeling that a researcher cannot cite it as a source. There are traditions within disciplines around how to cite and what are the ‘right’ things to cite.

[Further posts on Exploring British Design will follow, with reflections on our workshops and updates on the project generally]

 

 

 

 

 

 

The Twelve Days of Christmas – archives style!

Archives Hub feature for December 2014

The Twelve Days of Christmas song poster
“The Twelve Days of Christmas song poster” by Xavier Romero-Frias is
licensed under CC BY-SA 3.0

There are several versions of the traditional folk melody The Twelve Days of Christmas (http://en.wikipedia.org/wiki/The_Twelve_Days_of_Christmas_%28song%29). This feature is based on the 1909 publication by English composer Frederic Austin.

On the twelfth day of Christmas, my true love sent to me…

Twelve drummers drumming

Max Abrams Collection, 1920s-1992. Max Abrams was a drummer, teacher of drums and author of drum tutors. He kept detailed diaries between 1943 and 1992, which document his performance career and information about his pupils, as well as personal information. He wrote around 50 jazz tutor books.
http://archiveshub.ac.uk/data/gb2942-ma

Logo for Seven Stories
Logo for Seven Stories, the Centre for Children’s Books

‘The Little Drummer Boy’ greetings card, c. 1968-1999. An illustration of the well-known carol, the card is part of a collection of publications, prints and original artwork by the illustrators, twins Janet and Anne Grahame Johnstone. The Johnston Memorial Collection is held by Seven Stories, the Centre for Children’s Books.
http://archiveshub.ac.uk/data/gb1840-jaj/jaj/02/04/10

Beat The Retreat On Thy Drum (Sam, Sam, Beat the Retreat!), 1932.
Printed score of a musical monologue performed by Stanley Holloway, part
of the Stanley Holloway Archive held by the V&A Department of Theatre and Performance. Stanley Holloway (1890-1982) made over 50 films, but he loved performing in the theatre and the comic monologues, for which he was so well known.
http://archiveshub.ac.uk/data/gb71-thm/18/thm/18/1/7

Eleven pipers piping

Papers of John and Myfanwy Piper, 1882-1990s. John Piper (1903-1992) was a major figure in modern British art. He was a painter in oils and water colour, designed stained glass, ceramics and for the stage, made prints and devised ingenious firework displays. In addition to this he was also a gifted photographer of buildings and landscapes. Piper also wrote poetry, art criticism and several guidebooks on landscape and architecture.
http://archiveshub.ac.uk/data/gb70-tga200410

W.T. Piper papers, 1914-1919. W.T. Piper was a Private, 5th Battalion, East Surrey Regiment, serving in India.
http://archiveshub.ac.uk/data/gb206-liddlecollectionind31

Ten lords a-leaping

Papers of Horatio Nelson, Viscount and First Admiral, 1758-1805. Held by Glasgow University Library, Special Collections Department, comprising correspondence concerning the promotion of Lieutenant Scott of Monmouth.
http://archiveshub.ac.uk/data/gb247-msgen512/35

Manuscript of speeches made by Lord Crewe, Lord Lansdowne, and Lord Loreburn in the Library of the House of Lords, 1908. The speeches were made on Monday, 27th July, 1908, on the occasion of the presentation to the Lord Chancellor, Lord Loreburn, of his portrait painted by Sir George Reid.
http://archiveshub.ac.uk/data/gb206-brothertoncollectionms19creid

Transcription of Thomas Hope, Major Practicks, c. 1670. Sir Thomas Hope (1573-1646) of Craighall, advocate and politician. He was solicitor to the Church of Scotland, became a very successful advocate, then worked for Charles I and was appointed Lord Advocate in 1626 and admitted to the Scottish privy council 2 years later.
http://archiveshub.ac.uk/data/gb227-mske.l2

Nine ladies dancing

Photograph of ballet dancer, Anthony Crickmay Dance Photographs, © V&amp;A Department of Theatre and Performance.
Anthony Crickmay Dance Photographs (THM/20), © V&A Department of Theatre and Performance, Victoria and Albert Museum, London.

Collection of material relating to Anna Pavlova, 1875-1965. Anna Pavlova (1881-1931) was the most celebrated ballerina of her generation. The collection includes accessories originally worn by Pavlova in performance, scrapbooks containing many assorted press and illustrated magazine cuttings featuring Pavlova and sepia prints of Pavlova at a young age.
http://archiveshub.ac.uk/data/gb3208-rbs/pav

Adeline Genée Archive Collection, c. 1890-1970. Danish by birth, Adeline Genée (1878-1970), was a talented ballerina and the founder president of the Association of Teachers of Operatic Dancing of Great Britain (later the Royal Academy of Dance).
http://archiveshub.ac.uk/data/gb3370-rad/ag

Marie Rambert Collection, 1890s-1980s. Collection of films, costumes, photographs, correspondence, diaries, programmes, press cuttings, personal papers, autobiographical notes, awards and medals owned and collected by Dame Marie Rambert throughout her life as well as papers relating to her death and memorials.
http://archiveshub.ac.uk/data/gb2228-mr

Eight maids a-milking

M. Russell-Fergusson papers, 1914-1990. M. Russell-Fergusson, Women’s National Land Service Corps, served as a milk maid in Norfolk from Aug. 1917 and later in Leicestershire and at the Royal Dairy Farm, Windsor.
http://archiveshub.ac.uk/data/gb206-liddlecollectiondf112

Photograph of Audree Howard as the Milkmaid in ‘Facade’, 1930s. Part of a small collection relating to the artist Paul Nash at the Tate Gallery Archive.
http://archiveshub.ac.uk/data/gb70-tga769/tga769/5/13

Seven swans a-swimming

Logo: University of Leeds
Logo: University of Leeds (Leeds University Library Special Collections)

Books about Russia written by members of the Swan/Swann family, 1968-1989. The Swan/Swann family were members of the British community in pre-revolutionary Russia. Material held by Leeds University Library.
http://archiveshub.ac.uk/data/gb206-ms1036

Papers of and relating to Annie S. Swan, c. 1900-1946. Annie Shepherd Swan, daughter of Edward Swan, farmer and potato merchant, was born in Mountskip, near Edinburgh in 1859. She married James Burnett Smith in 1883, and in the early years of their marriage her writing supported him through medical school.
http://archiveshub.ac.uk/data/gb231-ms3517

Swan Land and Cattle Company, 1883-1947. The collection is composed of reminiscences of the Swan Land and Cattle Company. The home ranch of the Swan Land and Cattle Company was sited at Chugwater, Wyoming. Its corporate headquarters were in Cheyenne. This large corporate cattle company, with between 50,000 and 80,000 livestock, at one time controlled an area of land greater than the size of the State of Connecticut.
http://archiveshub.ac.uk/data/gb237-coll-162

Six geese a-laying

‘Taking a gander’. Article concerning the geese at the University, 1966. Part of the Lady Violet Deramore Collection (1881-2005) held by the Borthwick Institute, University of York.
http://archiveshub.ac.uk/data/gb193-vder/vder/3/1/2/10

As it’s pantomime season (oh no it’s not! Oh yes it is!), we also have:

Cuttings about Mother Goose pantomime, 1951. These records form part of the Unity Theatre, theatre company collection held by V&A Department of Theatre and Performance. Unity Theatre was founded in 1936 by a general meeting of the Rebel Players and Red Radio, left-wing theatre groups derived from the Workers’ Theatre Movement.
http://archiveshub.ac.uk/data/gb71-thm/9/thm/9/4/5/77

Five gold rings

Small printed notice “Unique and hitherto unknown variety of the Gold Ring Money of Ireland in the form of an Ear Ornament”, 1840s. Held by Chetham’s Library, this item forms part of the The Correspondence of John Bell, Antiquary and Land Surveyor, Gateshead, Newcastle Collection.
http://archiveshub.ac.uk/data/gb418-bell/bell/1/29

The rings may in fact refer to ringed-necked pheasants:

Pictorial tapestry rug featuring a pheasant, 1888.
Tapestry rug of worsted yarn and jute in acid colours featuring a pheasant in a floral landscape. Part of the Stoddard-Templeton Carpet and Textile Collection (c. 1840s-1960s). James Templeton and Co. was established in 1843, making Chenille, Axminster, Wilton and Brussels carpets. It employed artists of international calibre such as Charles Voysey, Walter Crane and Frank Brangwyn, with their carpets used in Coronations and in liners such as the Titanic. The collection is held by The Glasgow School of Art Archives and Collections Centre.
http://archiveshub.ac.uk/data/gb1694-dc077/2/1

Four calling birds

This could be song birds, such as Canaries, or may be ‘colly’ or black birds:

Descriptions of the Canary Islands and of the Azores, c. 1610.
The manuscript consists of two works, bound together. The first is a description of the Canary Islands, detailing the history, religion and laws of the natives, called the Guanches, as well as observations on the geography and fauna of the islands. The second work is a compilation from other works describing the Azores.The existence of the Canary Islands, a chain of seven islands off the northwest coast of Africa, was known to the Romans and later the Arabs, and European navigators reached the islands in the 13th century. The Azores, an archipelago in the Mid-Atlantic, were discovered in 1427 by the Portuguese and their colonisation by them began in 1432.
http://archiveshub.ac.uk/data/gb133-engms17

Image: Transport for London Metropolitan Line
Image: TfL Metropolitan Line, Transport for London Corporate Archives.

Briefing on Canary Wharf Station, 1989.
Paper concerning delays and changes in the redesign of Canary Wharf Station. Subjects include construction and negotiations, unresolved issues and financial risk. Part of a series of minutes of meetings belonging to the Transport for London Group Archive.
http://archiveshub.ac.uk/data/gb2856-%28new%29lt000099/%28new%29lt000099/035

Production contracts for ‘Study from ‘Blackbird”, 2002. Part of the Rambert Dance Company Archive: Productions collection (1920s – 2010s), the folder includes choreographer contracts, production budget and correspondence concerning casting travel and rehearsals.
http://archiveshub.ac.uk/data/gb2228-rdc/pd/rdc/pd/06/01/0423

Three French hens

‘The Little White Hen’, 1989-2003.
Material relating to ‘The Little White Hen’, written by Philippa Pearce and illustrated by Gillian McClure (Scholastic, 1996). The series includes a dummy book; preliminary artwork; four pieces of finished artwork; a small amount of correspondence from Philippa Pearce, with some reviews of the book; and a copy of the first edition of the book.
http://archiveshub.ac.uk/data/gb1840-gmc/gmc/04

Hen Gapel, Llanbryn-mair Chapel Records, 1898-1932.
Hen Gapel (Old Chapel) in Llanbryn-mair, Montgomeryshire is one of the oldest and most famous chapels in Wales. As far back as 1635 the Rev Walter Craddoc had a small congregation in Llanbryn-mair. Initially, the cause had no home and meetings were held in houses or in a nearby forest. In 1739 a chapel was built (then re-built in 1821).
http://archiveshub.ac.uk/data/gb222-bmsshg

Two turtle doves

Ms transcript of song, ‘The Turtle Dove’. 2 leaves belonging to a series of ms and ts transcripts of songs and ballads (1925 to 1965) by the poet and author Robert Graves (1895-1985). The papers are held at St John’s College, Oxford.
http://archiveshub.ac.uk/data/gb473-rg/m/rg/m/ballads/4

Records for the Dove Brothers Ltd, builders, 1850-1970.
Dove Brothers Ltd was a prominent construction company based in Islington from 1781 to 1993 which worked with most of the major architects of the late 19th to 20th century. The company was founded by William Spencer Dove (1793-1869). His sons formed the Dove Brothers partnership in 1852.
http://archiveshub.ac.uk/data/gb1032-s/dov

And a partridge in a pear tree!

David Cassidy Collection, 1972-1976. The Amercian singer David Cassidy was best known for the musical sitcom The Partridge Family.
http://archiveshub.ac.uk/data/gb71-thm/378

Image of title page from "The 12 days of Christmas", 1780
Title page from the first known publication of “The 12 days of Christmas” in 1780.
Image in the public domain.

Bernard Partridge Drawings Collection, 1861-1905. Bernard Partridge (1861-1945) was a painter and illustrator who became the principal cartoonist of Punch magazine.
http://archiveshub.ac.uk/data/gb71-thm/227

Artworks by James Joshua Guthrie and relating to the Pear Tree Press, 1897-1930s. Designs and illustrations, along with other book illustration work and bookplates for the Pear Tree Press.
http://archiveshub.ac.uk/data/gb58-addms88957/addms88957/4/4

Trustees of W S Brown – proposed purchase of Deep Mines under Pear Tree House, Tyldesley. 1905. 2 items of correspondence, maintained by the trustees of the Bridgewater estate Ltd.
http://archiveshub.ac.uk/data/gb427-bea/bea/i/1774

Related information

Birds of the Twelve Days of Christmas, 10,000 Birds blog post, 2013: http://10000birds.com/birds-of-the-twelve-days-of-christmas.htm

James Phillips Kay-Shuttleworth – pioneering educational reformer

Archives Hub feature for November 2014

Funded by a grant from the John Rylands Research Institute, we have recently catalogued the papers of celebrated Victorian educationist Sir James Phillips Kay-Shuttleworth (1804-1877), opening up the rich content of this archive to researchers across the world.

Kay-Shuttleworth was born James Kay in Rochdale, Lancashire, into a textile manufacturing family. After qualifying as a doctor, he went on to have a distinguished career. He was a pioneer of public health, an influential civil servant, and played a key part in nineteenth-century educational reform, laying the groundwork for today’s system of national school education.

Kay-Shuttleworth’s career

After training at Edinburgh University, James Kay returned to practise as a doctor in Manchester in 1827. The following year, he co-founded the Ardwick and Ancoats Dispensary, a charity based in one of the poorest areas of the city. Through this work, he witnessed the appalling living conditions of the urban poor, and became increasingly involved in public health initiatives.

In 1832, the year of the cholera epidemic, he published his seminal pamphlet, The Moral and Physical Condition of the Working Classes Employed in the Cotton Manufacture in Manchester. This predated by some 13 years Friedrich Engels’ better-known The Condition of the Working Class in England.

In 1835, he became an Assistant Poor Law Commissioner for Norfolk and Suffolk, a role which gave rise to his lifelong interest in education and his conviction that it held the key to society’s regeneration.

Image of pamphlet The Training of Pauper Children
The Training of Pauper Children (1839): Kay-Shuttleworth’s ideas about educational reform had their origins in his work with pauper children.

In 1839, he was appointed as Assistant Secretary to the Whig government’s Committee of the Privy Council on Education, which administered grants for public education, a post he held for nine years. He was a highly effective civil servant and much of what we take for granted today had its origins in his inspired reforms. In 1840, he established Battersea College, the first teacher training college in Britain. He created a school inspection system; he argued for state education; and he forced through regulations around how children were taught, the design of school buildings, the structure of the teaching profession and the ways in which schools were governed.

There are over 1,000 letters in Kay-Shuttleworth’s archive, reflecting his whole professional career. Correspondents include those involved in education and philanthropy like Matthew Arnold and Angela Burdett-Coutts, as well as many Liberal or Whig politicians, including Gladstone, W.E. Forster, Lord John Russell and John Bright. Most of his key publications are also represented.

Family ties

The archival material relating to Kay-Shuttleworth’s public life is complemented by extensive personal and family correspondence, providing a fascinating insight into family relationships, social and gender roles.

In 1842, he married Lady Janet Shuttleworth, the heiress of Gawthorpe Hall in Lancashire, and adopted her surname on marriage, becoming Kay-Shuttleworth. The couple had five children.

Photograph of Gawthorpe Hall
Gawthorpe Hall, Padiham, Lancashire. James Kay-Shuttleworth set his own stamp on his wife’s ancestral home, employing fashionable architect Charles Barry to undertake major renovations in the 1850s. Photograph courtesy of Lee Pilkington.

The letters between Kay-Shuttleworth and his son Ughtred James (1844-1939) show the closeness of their relationship. Ughtred inherited Gawthorpe Hall, and estate management is discussed in some detail, as is Ughtred’s early political career; he went on to become a successful Liberal MP.

Other relationships were less straightforward. Correspondence in the archive documents the young James Kay’s unsuccessful courtship of Helen Kennedy, daughter of a wealthy Manchester family. Later, he grew apart from his wife, Janet; in 1851 she moved permanently to the Continent, ultimately settling in Italy with her eldest child Janet, two youngest sons, and the family governess Rosa Poplawska.

Two of the Kay-Shuttleworth sons – Robert (known as Robin) and Stewart – caused ongoing anxiety to their father. Neither lived up to his expectations, either getting into debt or associating with people of whom their parents disapproved. Ultimately Kay-Shuttleworth arranged for Robin to travel to Australia and take up sheep-farming (although he proved a continued source of worry to his parents), and Stewart emigrated to Sri Lanka (then Ceylon) to run a plantation.

Literary circles

Kay-Shuttleworth’s literary aspirations are less well-known than his public career. Always passionate about literature, after his retirement he published two historical novels set in his home county of Lancashire, Scarsdale (1860) and Ribblesdale (1870). Correspondence and reviews relating to these two novels are included in his archive, as is the manuscript of a third novel, Cromwell in the North, which remained unpublished at his death, and his unpublished autobiography.

Image of a page from Gaskell’s manuscript of The Life of Charlotte Brontë
A page from Gaskell’s manuscript of The Life of Charlotte Brontë, from the Library’s Elizabeth Gaskell Collection

His own literary endeavours failed to attract much critical acclaim, and his greatest contribution to literature was probably his role in bringing together Charlotte Brontë and Elizabeth Gaskell. The two writers first met in August 1850, during a visit to the summer home of the Kay-Shuttleworths in the Lake District. Gaskell was already fascinated by what she knew of Brontë and her isolated life in Haworth, which was so different from Gaskell’s own bustling home in Manchester. Despite their many differences, the women immediately struck up a friendship which lasted until Brontë’s premature death in 1855. Gaskell went on to write the celebrated biography of her friend.

Photograph of Elizabeth Gaskell
Elizabeth Gaskell, c. 1864. Photograph by Alexander McGlashon

 

Having been refused access to the manuscript of Brontë’s unpublished novel, The Professor, by her widower, the Rev. Arthur Nicholls, Gaskell recruited Kay Shuttleworth’s assistance. They visited the parsonage at Haworth together in July 1856. The forceful personality of Sir James overcame the misgivings of Nicholls. He and Gaskell came away not only with The Professor manuscript, but also the fragment of a novel called Emma which Brontë had been working on before her marriage, and the now-famous miniature ‘Gondal’ and ‘Angria’ manuscripts created by Brontë and her siblings.

 

Fran Baker (Archivist) and Jane Speller (Project Archivist), The University of Manchester Library

Find out more and explore the collection:

Papers of Sir James Phillips Kay-Shuttleworthhttp://archiveshub.ac.uk/data/gb133-jks

Big Data, Small Data and Meaning

Victorian joke. From the Victorian Meme Machine, a BL Labs project (http://www.digitalvictorianist.com/)
Victorian joke. From the Victorian Meme Machine, a BL Labs project (http://www.digitalvictorianist.com/)

The BL Labs is an initiative funded by the Mellon Foundation that invites researchers and developers to work with the BL and their digital data to address research questions.  The Symposium 2014 showcased some of the work funded by the Labs, presenting innovative and exploratory projects that have been funded through this initiative. This year’s competition winners are  the Victorian Meme Machine, creating a database of Victorian jokes,  and a Text to Image Linking Tool (TILT) for linking areas on a page image and a clear transcription of the content.

Tim Hitchcock, Professor of Digital History from the University of Sussex, opened with a great keynote talk. He started out by stressing the role of libraries, archives and museums in preserving memory and their central place in a complex ecology of knowledge discovery, dissemination and reflection. He felt it was essential to remember this when we get too caught up in pursuing shiny new ideas.  It is important to continually rethink what it is to be an information professional; whilst also respecting the basic principles that a library (archive, museum) was created to serve.

Tim Hitchcock’s talk was Big Data, Small Data and Meaning. He said that conundrums of size mean there is a danger of a concentration on Big Data and a corresponding neglect of Small Data. But can we view and explore a world encompassing both the minuscule and the massive? Hitchcock introduced the concept of the macroscope, a term coined in a science fiction novel  by Piers Anthony back in 1970. He used this term in his talk to consider the idea of a macro view of data. How has the principle of the macroscope influenced the digital humanities? Hitchcock referred to Katy Borner’s work with Plug-and-Play Macroscopesa: “Macroscopes let us observe what is at once too great or too slow or too complex for the human eye and mind to notice and comprehend.” (See http://vimeo.com/33413091 for an introductory video).

Hitchcock felt that ideally macroscopes should be to observe patterns across large data and at the same time show the detail within small data.  The way that he talked about Big Data within the context of both the big and the small helped me to make more sense of Big Data methods. I think that within the archive community there has been something of a collective head scratching around Big Data;  what its significance is, and how it relates to what we do. In a way it helps to think of it alongside the analysis that Small Data allows researchers to undertake.

Graph from Paper Machines
Paper Machines visualisation (http://papermachines.org/)

Hitchcock gave some further examples of Big Data projects. Paper Machines is a plugin for Zotero that enables topic modelling analysis. It allows the user to curate a large collection of works and explore its characteristics with some great results; but the analysis does not really address detail.

The History Manifesto, by Jo Guldi and David Armitage talks about how Big Data might be used to redefine the role of Digital Humanities. But Hitchcock criticised it for dismissing micro-history as essentially irrelevant.

Scott Weingart is also a fan of the macroscope. He is a convincing advocate for network analysis, which he talks about in his blog, The modern role of DH in a data-driven world:

“distant reading occludes as much as it reveals, resulting in significant ethical breaches in our digital world. Network analysis and the humanities offers us a way out, a way to bridge personal stories with the big picture, and to bring a much-needed ethical eye to the modern world.”

Hitchcock posited that the large scale is often seen as a route to impact in policy formation, and this is an attractive inducement to think large. In working on a big data scale, Humanities can speak to power more convincingly; it can lead to a more powerful voice and more impact.

We were introduced to Ben Schmidt’s work, Prochronisms. This uses TV anachronisms to learn about changes in language scales of analysis around the analysis of text used, and Schmidt has done some work around particular TV programmes and films, looking at the overall use of language and the specifics of word use. One example of his work is the analysis of 12 Years a Slave:

visual representation of language in 12 Years a Slave
12 Years a Slave: Word Analysis (http://www.prochronism.com/)

‘the language Ridley introduces himself is full of dramatically modern words like “outcomes,” “cooperative,” and “internationally:” but that where he sticks to Northup’s own words, the film is giving us a good depiction of how things actually sounded. This is visible in the way that the orange ball is centered much higher than the blue one: higher translates to “more common than then now.”‘

Schmidt gives very entertaining examples of anachronisms, for example, the use of ‘parenting a child’ in the TV drama series Downton Abbey, which only shows up in literature 5 times during the 1920’s and in a rather different context to our modern use; his close reading of context also throws up surprises, such as his analysis of the use of the word ‘stuff’ in Downton Abbey (as in ‘family stuff’ or ‘general stuff’), which does not appear to be anachronistic and yet viewers feel that it is a modern term.  (A word of warning, the site is fascinating and it’s hard to stop reading it once you start!)

Professor Hitchcock gave this work as an example of using a macroscope effectively to combine the large and the small. Schmidt reveals narrative arcs; maybe showing us something that hasn’t been revealed before…and at the same time creates anxiety amongst script writers with his stark analysis!

Viewing data on a series of scales simultaneously seems a positive development, even with the pitfalls. But are humanists privileging social science types of analysis over more traditional humanist ones? Working with Big Data can be hugely productive and fun, and it can encourage collaboration, but are humanist scholars losing touch with what they traditionally do best? Language and art, cultural construction and human experience are complex things. Scholars therefore need to encompass close reading and Small Data in their work in order to get a nuanced reading.  Our urge towards the all-inclusive is largely irresistible, but in this fascination we may lose the detail. The global image needs to be balanced with a view from the other end of the macroscope.

It is important to represent and mobilise the powerless rather than always thinking about the relationship to the powerful; to analyse the construct of power rather than being held in the grip of power and technology. Histories of small things are often what gives voice to those who are marginalised. Humanists should encompass the peculiar and eccentric; they should not ignore the power of the particular.

Graph showing evidence for the Higgs boson particle
Graph showing evidence for the Higgs particle (http://www.atlas.ch/news/2012/latest-results-from-higgs-search.html)

Of course, Big Data can have huge and fundamental results. The discovery of the Higgs particle was the result of massive data crunching and finding a small ‘bump’ in the data that gave evidence to support its existence. The other smaller data variations needed to be ignored in this scenario. It was a case of millions of rolls of the dice to discover the elusive particle. But if this approach is applied across the board, the assumption is that the signal, or the evidence, will come through, despite the extraneous blips and bumps. It doesn’t matter if you are using dirty data because small hiccups are just ignored.  But humanists need to read data with an eye to peculiarities and they should consider the value of digital tools that allow them to think small.

Hitchcock believes that to perform humanities effectively we need to contextualise.  And the importance of context is never lost to an archivist, as this is a cornerstone of our work. Big Data analysis can lose this context; Small Data is all about understanding context to derive meaning.

Using the example of voice onset timing, which refers to the tiny breathy gap before speaking, Hitchcock showed that a couple of milliseconds of empty space can demand close reading, because it actually changes depending on who you are talking to, and it reveals some really interesting findings. A Big Data approach would simply miss this fascinating detail.

Big data has its advantages, but it can mean that you don’t look really closely at the data set itself. There is a danger you present your results in a compelling graph or visualisation, but it is hard to see whether it is a flawed reality. You may understand the whole thing, and you can draw valuable conclusions, but you don’t take note of what the single line can tell you.