Hub contributors’ reflections on the current and future state of the Hub



The Archives Hub is what the contributors make it, and with over 170 institutions now contributing, we want to continue to ensure that we listen to them and develop in accordance with their needs. This week we brought together a number of Archives Hub contributors for a workshop session. The idea was to think about where the Hub is now and where it could go in the future.
We started off by giving a short overview of the new Hub strategy, and updating contributors on the latest service developments. We then spent the rest of the morning asking them to look at three questions: What are the benefits of being part of the Hub? What are the challenges and barriers to contributing? What sort of future developments would you like to see?
Probably the strongest benefit was exposure – as a national service with an international user-base the Hub helps to expose archival content, and we also engage in a great deal of promotional work across the country and abroad. Other benefits that were emphasised included the ability to search for archives without knowing which repository they are held at, and the pan-disciplinary approach that a service like the Hub facilitates. Many contributors also felt that the Hub provides them with credibility, a useful source of expertise and support, and sometimes ‘a sympathetic ear’, which can be invaluable for lone archivists struggling to make their archives available to researchers. The network effect was also raised – the value of having a focus for collaboration and exchange of idea.
A major barrier to contributing is the backlog of data, which archivists are all familiar with, and the time required to deal with this, especially with the lack of funding opportunities for cataloguing and retro-conversion. The challenges of data exchange were cited, and the need to make this a great deal easier. For some, getting the effective backing of senior managers is an issue. For those institutions who host their own descriptions (Spokes), the problems surrounding the software, particularly in the earlier days of the distributed system, were highlighted, and also the requirement for technical support. One of the main barriers here may be the relationship with the institution’s own IT department. It was also felt that the use of Encoded Archival Description (EAD) may be off-putting to those who feel a little intimidated by the tags and attributes.
People would like to see easy export routines to contribute to the Hub from other sytems, particularly from CALM, a more user-friendly interface for the search results, and maybe more flexibility with display, as well as the ability to display images and seamless integration of other types of files. ‘More like Google’ was one suggestion, and certainly exposure to Google was considered to be vital. It would be useful for researchers to be able to search a Spoke (institution) and then run the same search on the central Hub automatically, which would create closer links between Spokes and Hub. Routes through to other services would add to our profile and more interoperability with digital repositories would be well-received. Similarly, the ability to search across archival networks, and maybe other systems, would benefit users and enable more people to find archival material of relevance. The importance of influencing the right people and lobbying were also listed as something the Hub could do on behalf of contributors.
After a very good lunch at Christie’s Bistro we returned to look at three particular developments that we all want to see, and each group took one issues and thought about what the drivers are that move it forward and what the retraining forces are that stop it from happening. We thought about usability, which is strongly driven by the need to be inclusive and to de-mystify archival descriptions for those not familiar with archives and in particular archival hierarchies. It is also driven by the need to (at least in some sense) compete with Google, the need to be up-to-date, and to think about exposing the data to mobile devices. However, the unrealistic expectations that people have and, fundamentally, the need to be clear about who our users are and understanding their needs are hugely important. The quality and consistency of the data and markup also come into play here, and the recognition that this sort of thing requires a great deal of expert software development.
The need for data export, the second issue that we looked at, is driven by the huge backlogs of data and the big impact that this should have on the Hub in terms of quantity of descriptions. It should be a selling point for vendors of systems, with the pressure of expectation from stakeholders for good export routines. It should save time, prove to be good value for money and be easily accommodated into the work flow of an archive office. However, complications arise with the variety of systems out there and the number of standards, and variance in application of standards. There may be issues about the quality of the data and people may be resistant to changing their work habits.
Our final issue, the increased access to digital content, is driven by increased expectations for accessing content, making the interface more visually attractive (with embedded images), the drive towards digitisation and possibly the funding opportunities that exist around this area. But there is the expense and time to consider, issues surrounding copyright, the issue of where the digital content is stored and issues around preservation and future-proofing.
The day ended with a useful discussion on measuring impact. We got some ideas from contributors that we will be looking at and sharing with you through our blog. But the challenges of understanding the whole research life-cycle and the way that primary sources fit into this are certainly a major barrier to measuring the impact that the Hub may have in the context of research outputs.

Web 2.0 for teaching: wishy-washy or nitty-gritty?

A useful report, summarising Web 2.0 and some of the perspectives in literature about Web 2.0 and teaching, was recently produced by Susan A. Brown of the School of Education at the University of Manchester: The Potential of Web 2.0 in Teaching: a study of academics’ perceptions and use. The findings were based on a questionnaire (74 respondents across 4 Faculties) and interviews (8 participants) with teaching staff from the University of Manchester. It is available on request, so let us know if you would like a copy.
Some of the points that came out of the report:
  • It is the tutors’ own beliefs about teaching that are the main influence on their perceptions of Web 2.0
  • There is little discussion about Web 2.0 amongst colleagues and the use of it is generally a personal decision
  • Top-down goals and initiatives do not play a major part in use of Web 2.0
  • It may be that a bottom-up experimental approach is the most appropriate, especially given the relative ease with which Web 2.0 tools can be deployed, although there were interviewees who argued for a more considered and maybe more strategic approach, which suggests something that is more top-down
  • There is little evidence that students’ awareness of Web 2.0 is a factor, or that students are actively arguing in favour of its use:
“This absence of a ‘student voice’ in tutors’ comments on Web 2.0 is interesting given the perceptions of ‘digital natives’ – the epithet often ascribed to 21st Century students – as drivers for the greater inclusion of digital technologies. It may shore up the view that epithets such as ‘digital natives’ and ‘Millennials’ to describe younger students over-simplify a complex picture where digital/Web technology users do not necessarily see the relevance of Web 2.0 in education.”
  • The use of and familiarity with Web 2.0 tools (personal use or use for research) was not a particularly influential factor in whether the respondents judged them to have potential for teaching.
  • In terms of the general use of Web 2.0 tools, mobile social networking (e.g Twitter) and bookmarking were the tools used the least amongst respondents. Wikis, blogs and podcasting had higher use.
  • In terms of using these tools for teaching, the data was quite complex, and rather more qualitative than quantitative, so it is worth looking at the report for the full analysis. There were interviewees who felt that Web 2.0 is not appropriate for teaching, where the role of a teacher is to lay down the initial building blocks of knowledge, implying that discussion can only follow understanding, not be used to achieve understanding. There was also a notion that Web 2.0 facilitates more surface, social interactions, rather than real cognitive engagement.
“A number of…respondents expressed the view that Web 2.0 is largely socially orientated, facilitating surface ‘wishy-washy’ discussion that cannot play a role in tacklinkg the ‘nitty-gritty’ of ‘hard’ subject matter”.
Three interviewees saw a clear case for the use of Web 2.0 and they referred to honing research skills, taking a more inquiry-based approach and taking a more informal approach and tapping into a broader range of expertise.
In conclusion “The study indicates that there are no current top-down and bottom-up influences operating that are likely to spread Web 2.0 use beyond individuals/pockets of users at the UoM [Universtiy of Manchester]”. The study recommends working with a small group of academics to get a clearer understanding of the issues they face in teaching and how Web 2.0 might offer opportunities, as well as providing an opportunity for more detailed discussion about teaching practices and thinking about how to tailor Web2.0 for this context.

Archival Context: entities and multiple identities


I recently took part in a Webinar (Web seminar) on the new EAC-CPF standard. This is a standard for the encoding of information about record creators: corporate bodies, persons and families. This information can add a great deal to the context of archives, supporting a more complete understanding of the records and their provenance.

We were given a brief overview of the standard by Kathy Wisser, one of the Working Group members, and then the session was open to questions and discussion.

The standard is very new, and archivists are still working out how it fits in to the landscape and how it relates to various other standards. It was interesting to note how many questions essentially involved the implementation of EAC-CPF: who creates the records? where are they kept? how are they searched? who decides what?
These questions are clearly very important, but the standard is just a standard for the encoding of ISAAR(CPF) information. It will not help us to figure out how to work together to create and use EAC-CPF records effectively.
In general, archivists use EAD to include a biographical history of the record creator, and may not necessarily create or link to a whole authority record for them. The idea is that providing separate descriptions for different entities is more logical and efficient. The principle of separation of entities is well put: “Because relations occur between the descriptive nodes [i.e. between archive collections, creators, functions, activities], they are most efficiently created and maintained outside of each node.” So that if you have a collection description and a creator description, the relationship between the two is essentially maintained separately to the actual descriptions. If only EAD itself was a little more data-centric (database friendly you might say), this would facilitate a relational approach.
I am interested in how we will effectively link descriptions of the same person, because I cannot see us managing to create one single authoritative record for each creator. This is enabled via the ‘identities’: a record creator can have two or more identities with each represented by a distinct EAC-CPF instance. I think the variety of identity relationships that the standard provides for is important, although it inevitably adds a level of complexity. It is something we have implemented in our use of the tag to link to related descriptions. Whilst this kind of semantic markup is a good thing, there is a danger that the complexity will put people off.
I’m quite hung-up on the whole issue of identifiers at the moment. This may be because I’ve been looking at Linked Data and the importance of persistent URLs to identify entities (e.g. I have a URL, you have a URL, places have a URL, things have a URL and that way we can define all these things and then provide links between them). The Archives Hub is going to be providing persistent URLs for all our descriptions, using the unique identifier of the countrycode, repository code and local reference for the collection (e.g. http://www.archiveshub.ac.uk/search/record.html?id=gb100mss, where 100 is the repository code and MSS is the local reference).
I feel that it will be important for ISAAR(CPF) records to have persistent URLs, and these will come from the recordID and the agencyCode. Part of me thinks the agency responsible for the EAC-CPF instance should not be part of the identifer, because the record should exist apart from the institution that created it, but then realistically, we’re not going to get consensus on some kind of independent stand-alone ISAAR(CPF) record. One of the questions I’m currently asking myself is: If two different bodies have EAC-CPF records, does it matter what the identifers/URLs are for those records, even if they are for the same person? Is the important thing to relate them as representing the same thing? I’m sure its very important to have a persistent URL for all EAC-CPF instances, because that is how they will be discoverable; that is their online identity. But the question of providing one unique identifier for one person, or one corporate body is not something I have quite made my mind up about.
It will be interesting to see how the standard is assessed by archivists and more examples of implementation. The Archives Hub would be very interested to hear from anyone using it.

Designs on Delivery: GPO Posters from 1930 to 1960: Online extras

 Mail Coach A.D. 1784

University of the Arts London Archives and Special Collections Centre, in collaboration with The British Postal Museum & Archive, presents Designs on Delivery: GPO Posters from 1930 to 1960. The exhibition at the Well Gallery – and online here on the Archives Hub – focuses on a period when the Post Office was at the cutting edge of poster design and mass communication. It explores how the GPO translated, often complex, messages to the public in order to educate them about the services offered, by using text, image, and colour.

The Archives Hub website now has online extras: exclusively online, an additional eight posters representing the range of themes adopted by the General Post Office in their advertising.

Illustration: John Armstrong (1893-1973) ‘Mail Coach A.D. 1784’ (1935) reference The Royal Mail Archive POST 110/3175; copyright © Royal Mail Group Ltd and courtesy of The British Postal Museum & Archive.

Sustainable content: visits to contributors

I recently visited two of the contributors to the Archives Hub sustainable content development project. The archivists at Queen Mary, University of London (QMUL) and the BT Archives were nice enough to let me drink their tea, and see how they used CALM.

Axiell, developers of the CALM software, have kindly let us have access to a trial version of CALM to help with this project, but it

Designs on Delivery: GPO Posters from 1930 to 1960

NIGHT MAIL

University of the Arts London Archives and Special Collections Centre, in collaboration with The British Postal Museum & Archive, presents Designs on Delivery: GPO Posters from 1930 to 1960. The exhibition at the Well Gallery – and online here on the Archives Hub – focuses on a period when the Post Office was at the cutting edge of poster design and mass communication. It explores how the PO translated, often complex, messages to the public in order to educate them about the services offered, by using text, image, and colour.

As part of the exhibition, the Well Gallery will be showing on loop Night Mail (1936) which the British Film Institute calls "one of the most popular and instantly recognised films in British film history … one of the most critically acclaimed films … [of the] documentary film movement".

Illustration: poster designed by Pat Keely (died 1970) for the film Night Mail, reference The Royal Mail Archive POST 109/377; copyright © Royal Mail Group Ltd and courtesy of The British Postal Museum & Archive.

A few thoughs on context and content

I have been reading with interest the post and comments on Mark Matienzo’s blog: http://thesecretmirror.com. He asks ‘Must contextual description be bound to records description?’

I tend to agree with his point of view that this is not a good thing. The Archives Hub uses EAD, and our contributors happily add very excellent biographical and administrative history information into their descriptions, via the tag, information that I am sure is very valuable for researchers. But should our descriptions leave out this sort of information and be just descriptions of the collection and no more? Wouldn’t it be so much more sensible to then link to contextual information that is stored separately?
Possibly, on the other side of the argument, if archivists created separate biographical/administrative history records, would they still want to contextualise them for specific collection descriptions anyway? It makes perfect sense to have the information separate to the collection description if it is going to be shared, but will archivists want to modify it to make it relevant to particular collections? Is it sensible to link to a comprehensive biographical record for someone when you are describing a very small collection that only refers to a year in their life?
Of course, we don’t have the issue with EAD at the moment, in so far as we can’t include an EAC-CPF record in an EAD record anyway, because it doesn’t allow stuff to be included from other XML schemas (no components from other namespaces can be used in EAD). But I can’t help thinking that an attractive model for something like the Archives Hub would be collection descriptions (including sub-fonds, series, items), that can link to whatever contextual information is appropriate, whether that information is stored by us or elsewhere. This brings me back to my current interest – Linked Data. If the Web is truly moving towards the Linked Data model, then maybe EAD should be revised in line with this? By breaking information down into logical components, it can be recombined in more imaginative ways – open and flexible data!

Linked Data: towards the Semantic Web

The Semantic Web has always interested me, although some years have elapsed since I first came across it. It feels like it took a back seat for a while, but now it is back and starting to go places, particularly with the advent of Linked Data, which is a central concept behind the Semantic Web.
The first Linked Data Meetup was recently held in London, with presentations, case studies, panels and a free bar in the evening, courtesy of Talis and the prize winners of Best-in-use-Track Paper award from the European Semantic Web conference, who generously donated their winnings behind the bar. The venue may have been hidden away in Hammersmith, but the room was packed and the general atmosphere was one of expectation and enthusiasm.
I am still in the process of trying to grasp the issues surrounding the Semantic Web, and whilst some of the presentations at this event were a little over my head, there was certainly a great deal to inform and interest, with a good mix of people, including programmers, information professionals and others, although I was probably the only archivist!
One of the most important messages that came across was the importance of http URIs, without which linked data cannot work. URIs may commonly be URLs but essentially they are also unique identifiers, and this is what is important about them. We heard about what the BBC are up to from Tom Scott. They are making great strides with linked data, creating identifiers for every programme, in order to make the programme into an entity. But there are identifiers for a great deal more than just programmes – natural history is a subject area they have been focussing on, and now they have identifiers for animals, for groups of animals, for species, for where they live, etc. By ensuring that all of these entities have URIs it is possible to think about linking them in imaginative ways. Furthermore, relationships between entities have URIs – this is where the idea of triples comes in, referring to the concept of a subject linked to an object through a relationship.
The three parts of each triple are called its subject, predicate, and object. A triple mirrors the basic structure of a simple sentence, such as: the Archives Hub is based at Mimas. The Hub is the subject ‘is based at’ is the predicate and Mimas is the object.
Whilst humans may read sentences such as this and understand the entities and the relationships, the Semantic Web vision is that machines can do the same – finding, sharing, analysing and combining information.
Issues such as sustainability were raised, and the great need to make Linked Data easier to create and use. We heard about DataIncubator.org, a project that is creating and publishing Linked Data. The Talis Connected Commons scheme offers free access to the Talis platform for public domain data, which means you have access to an online triple store. Talis will host the data, although the end goal is for original curator of data to take it back and publish it themselves. But this does seem to be a great way to help get the momentum going on Linked Data. Talis are one of the leading suppliers of library software, but clearly they have decided to put their weight behind the Semantic Web, and they are keen to engage the community in this by providing help and support with dataset conversion, that is to say, conversion of data into RDF.
There was some talk of the need to encourage community norms, for example, with linking and attribution, something that is particularly important when taking someone else’s data. People should be able to trace the path back to original dataset. Another issue that came up was the need to work together, particularly avoiding different people working on converting the same dataset. It is important to make all of the code available and to benefit from shared expertise. It was very obvious that the people taking part in this event and showing us their projects were keen to collaborate and take a very open approach.
Leigh Dodds from Talis explained that dataincubator.org has already converted some major datasets, such as the NASA space flight dataset, which includes every space flight launch since 1950, and OpenLibrary, which already publishes RDF but the modelling of the data was not great and so Talis have helped with this. The data that Leigh talked about is already in public domain, so the essential task is to model it for output as RDF. Leigh gave us two of his wish list data sets for possible conversion: the Prelinger Archives, a collection of over 2,000 historic films (the content is in the Internet Archive) and Lego, which adds a fun element and would mean a meeting of similar minds, as people into lego are generally as anal as those who are into the Semantic Web!
Whilst many of the participants at the Linked Data Meetup were enthusiastic programmers rather than business people or managers, there was still a sense of the importance of the business case and taking a more intelligent approach to promotion and marketing.
Archivists are always very interested in issues of privacy, rights, and the ownership of data, and these issues were also recognised and discussed, though not in any detail. There did seem to be a rather curious suggestion of changing copyright law to ‘protect facts’, and thus bring it more in line with what is happening in the online environment.
As well as examples of what is happening at the BBC, we heard about a various other projects, such as a project to enable people to find, store, share, track, publish and understand statistics – timetric. This is essentially about linking statistics and URIs and creating meaningful relationships between numbers. One of the interesting observations made here was that it is better to collect the data first and then decide how to sort and present it, rather than beforehand, because otherwise you may design something that does not fit in with what people want.
For me, the Government Data Panel was one of the highlights of the day. It gave me a good sense of what is happening at the moment with Linked Data and what the issues are. Tim Berners-Lee (inventor of the Web) and Nigel Shadbolt talked about the decision to prioritise UK government data within the Linked Data project – clearly it is of great value for a whole host of reasons, and a critical mass of data can be achieved if the government are on board, and also we should not forget that it is ‘our data’ so it should be opened up to us – public sector data touches all of us, businesses, institutions, individuals, groups, processes, etc.
The Linked Data project is not about changing the way government data is managed but about access, enabling the data to be used by all kinds of people for all kinds of things. It is not just about transparency, but about actually running things better – it may increase efficiencies if the data is opened up in this way. Tim Berners-Lee told us how government ministers tended to refer to ‘the database’ of information, as in the creation of one massive database, a misconception of what this Linked Data project is all about. Ministers have also raised worries about personal data, about whether this project will require more time and effort from them, and whether they will have to change their practices. But within government there are a few early adopters who ‘get it’, and it will be important to try to clone that understanding! There was brief mention, in passing, of the Ordnance Survey being charged to make money to run its operations, and therefore there is a problem with getting this data. Similarly, when parts of the public sector were privatised, the franchises took the data with them (e.g. train timetables).
Location data was recognised as being of great importance. A huge percentage of data has location in it, and it can as hub to join disparate datasets. We need an RDF datastore of counties, authorities, constituencies, etc, and we should think about the importance of using the same identifier for a particular location so that we can use the location data in this way.
There was recognition that we have tended to conflate Linked Data and open data, but they are different. It is important to stress that open data may not be data that is linked up, and Linked Data may not be open, it may have restricted access. But if we can start to join up datasets, we can bring whole new value to them, for example, combining medical and educational data in different ways, maybe in ways we have not yet thought about. We want to shift the presumption that the data should be held close unless a reason is give to give it up (an FoI request!). If the data can be made available through FoI, then why not provide as linked data?
One of the big challenges that was highlighted was with local government, where attitudes are not quite so promising as with central government. Unfortunately, as one panel member put it, we are not in a benevolent dictatorship so we cannot order people to open up the data! It is certainly a diffcult issue, and although it was pointed out that there are some examples of local authorities who are really keen to open up their data, many are not, and Crown copyright does not apply to local authorities.
Tim encouraged us all to make RDF files, create tools, enable mash-ups, and so on, so that people can take data and do things with it. So, do go and visit http://data.gov.uk once it is up and running and show that you support the initiative.
Whilst other initiatives in e-governement and standards do appear to have come and gone, it ma be that we wouldn’t have got to where we are now without them, so often these things are all part of the evolutionary process. The approach to the Linked Data Project is bottom-up, which is important for its sustainability. Whislt support of the Prime Minister is important, in a way it is the support of the lower levels in govt that is more important.
The Semantic Web could bring enormous benefits if it is realised. The closing presentation by Tom Heath, from Talis, gave a sense of this, as well as a realistic assessment of what lies ahead. The work that is going on demonstrated what might be achievable, but it also demonstrated that we are in the very early stages of this journey. There are huge challenges around the quality of the data and disambiguation. I find it exciting because it takes us along the road of computers as intelligent agents, opening up data and enabling it to be used in new and imaginative ways.
If any archivists out there are thinking of doing anything with Linked Data we would be very interested to hear from you!