A Web of Possibilities

“Will you browse around my website”, said the spider to the fly,image of spider from Wellcome images
‘Tis the most attractive website that you ever did spy”

All of us want to provide attractive websites for our users. Of course, we’d like to think its not really the spider/fly kind of relationship! But we want to entice and draw people in and often we will see our own website as our key web presence; a place for people to come to to find out about who we are, what we have and what we do and to look at our wares, so to speak.

The recently released ‘Discovery’ vision is to provide UK researchers with “easy, flexible and ongoing access to content and services through a collaborative, aggregated and integrated resource discovery and delivery framework which is comprehensive, open and sustainable.”  Does this have any implications for the institutional or small-scale website, usually designed to provide access to the archives (or descriptions of archives) held at one particular location?

Over the years that I’ve been working in archives, announcements about new websites for searching the archives of a specific institution, or the outputs of a specific project have been commonplace.  A website is one of the obvious outputs from time-bound projects, where the aim is often to catalogue, digitise or exhibit certain groups of archives held in particular repositories. These websites are often great sources of in-depth information about archives. Institutional websites are particularly useful when a researcher really wants to gain a detailed understanding of what a particular repository holds.

However, such sites can present a view that is based more around the provider of the information rather than the receiver. It could be argued that a researcher is less likely to want to use the archives because they are held at a particular location, apart from for reasons of convenience, and more likely to want archives around their subject area, and it is likely that the archives which are relevant to them will be held in a whole range of archives, museums and libraries (and elsewhere). By only looking at the archives held at a particular location, even if that location is a specialist repository that represents the researcher’s key subject area, the researcher may not think about what they might be missing.

Project-based websites may group together archives in ways that  benefit researchers more obviously, because they are often aggregating around a specific subject area. For example, making available the descriptions and links to digital archives around a research topic. Value may be added through rich metadata, community engagement and functionality aimed at a particular audience. Sometimes the downside here is the sustainability angle: projects necessarily have a limited life-span, and archives do not. They are ever-changing and growing and descriptions need to be updated all the time.

So, what is the answer? Is this too much of a silo-type approach, creating a large number of websites, each dedicated to a small selection of archives?

Broader aggregation seems like one obvious answer. It allows for descriptions of archives (or other resources) to be brought together so that researchers have the benefit of searching across collections, bringing together archives by subject, place, person or event, regardless of where they are held (although there is going to be some kind of limit here, even if it is at the national level).

You might say that the Archives Hub is likely to be in favour of aggregation! But it’s definitely not all pros and no cons. Aggregations may offer a powerful search functionality for intellectually bringing together archives based on a researcher’s interests, but in some ways there is a greater risk around what is omitted. When searching a website that represents one repository, a researcher is more likely to understand that other archives may exist that are relevant to them. Aggregations tend to promote themselves as comprehensive – if not explicitly then implicitly – which this creates expectation that cannot ever fully be met. They can also raise issues around measuring impact and around licensing. There is also the risk of a proliferation of aggregation services, further confusing the resource discovery landscape.

Is the ideal of broad inter-disciplinary cross-searching going to be impeded if we compete to create different aggregations? Yes, maybe it will be to some extent, but I think that it is an inevitability, and it is valid for different gateways to service different audiences’ needs. It is important to acknowledge that researchers in different disciplines and at different levels have their own needs, their own specific requirements, and we cannot fulfill all of these needs by only presenting data in one  way.

One thing I think is critical here is for all archive repositories to think about the benefits of employing recognised and widely-used standards, so that they can effectively interoperate and so that the data remains relevant and sustainable over time. This is the key to ensuring that data is agile, and can meet different needs by being used in different systems and contexts.

I do wonder if maybe there is a point at which aggregations become unwieldy, politically complicated and technically challenging. That point seems to be when they start to search across countries. I am still unsure about whether Europeana can overcome this kind of problem, although I can see why many people are so keen on making it work. But at present, it is extremely patchy, and , for example, getting no results for texts held in Britain relating to Shakespeare is not really a good result. But then, maybe the point is that Europeana is there for those that want to use it, and it is doing ground-breaking work in its focus on European culture; the Archives Hub exists for those interested in UK Archives and a more cross-disciplinary approach; Genesis exists for those interested in womens studies; for those interested in the Co-operative movement, there is the National Co-operative Archive site; for those researching film, the British Film Institute website and archive is of enormous value.

So, is the important principle here that diversity is good because people are diverse and have diverse needs? Probably so. But at the same time, we need to remember that to get this landscape, we need to encourage data sharing and  avoid duplication of effort. Once you have created descriptions of your archive collections you should be able to put them onto your own website, contribute them to a project website, and provide them to an aggregator.

Ideally, we would be looking at one single store of descriptions, because as soon as you contribute to different systems, if they also store the data, you have version control issues. The ability to remotely search different data sources would seem to be the right solution here. However, there are substantial challenges. The Archives Hub has been designed to work in a distributed way, so that institutions can host their own data. The distributed searching does present challenges, but it certainly works pretty well. The problem is that running a server, operating system and software can actually be a challenge for institutions that do not have the requisite IT skills dedicated to the archives department.  Institutions that hold their own data have it in a great variety of formats. So, what we really need is the ability for the Archives Hub to seamlessly search CALM, AdLib, MODES, ICA AtoM, Access, Excel, Word, etc. and bring back meaningful results. Hmmm….

The business case for opening up data seems clear. Project like Open Bibliographic Data have helped progress the thinking in this arena and raised issues and solutions around barriers such as licensing.   But it seems clear that we need to understand more about the benefits of aggregation, and the different approaches to aggregation, and we need to get more buy-in for this kind of approach.  Does aggregation allow users to do things that they could not do otherwise? Does it save them time? Does it promote innovation? Does it skew the landscape? Does it create problems for institutions because of the problems with branding and measuring impact?  Furthermore, how can we actually measure these kinds of potential benefits and issues?

Websites that offer access to archives (or descriptions of archives) based on where they are located and based on they body that administers them have an important role to play. But it seems to me that it is vital that these archives are also represented on a more national, and even international stage. We need to bring our collections to where the users are. We need to ensure that Google and other search engines find our descriptions. We need to put archives at the heart of research, alongside other resources.

I remember once talking about the Archives Hub to an archivist who ran a specialist repository. She said that she didn’t think it was worth contributing to the Hub because they already had their own catalogue. That is, researchers could find what they wanted via the institute’s own catalogue on their own system, available in their reading room. She didn’t seem to be aware that this could only happen if they knew that the archive was there, and that this view rested on the idea that researchers would be happy to repeat that kind of search on a number of other systems. Archives are often about a whole wealth of different subjects – we all know how often there are unexpected and exciting finds. A specialist repository for any one discipline will have archives that reach way beyond that discipline into all sorts of fascinating areas.

It seems undeniable that data is going to become more open and that we should promote flexible access through a number of discovery routes, but this throws up challenges around version control, measuring impact, brand and identity. We always have to be cognisant of funding, and widely disseminated data does not always help us with a funding case because we lose control of the statistics around use and any kind of correlation between visits to our website and bums on seats. Maybe one of the challenges is therefore around persuading top-level managers and funders to look at this whole area with a new perspective?

Optimistic outcome for optimising the Hub


Paddy, Steve and I (Jane) have spent the last 4 months working on an interesting JISC project to optimise Archives Hub pages for search engines, as part of the Strategic Content Alliance

Initiative.

Search Engine Optimisation (SEO) is a process that aims to increase the visibility of a Website in important search engines like Google. SEO works by modifying the content, the layout, and the architecture of web pages, in addition to using community building techniques to enhance the popularity of a website.

As part of this project, an SEO expert is tracking and recording our current web traffic. We are implementing recommended changes and looking for changes to the website traffic after the changes are made.

Recommendations we have implemented
1. A Search Engine Sitemap

This is something that was developed by Google and is used by other search engines. An XML sitemap is a recommended way of organising a Website and identifying the URLs for the purpose of indexing the site by search engine bots, allowing them to find content and data faster and more efficiently. It is a means for us to tell the search engine what the important pages are, and we can also put a date into the sitemap as an indication of how often the page is updated. The sitemap should help the pages get indexed faster.

The sitemap was relatively easy to create, although it probably needs a bit more work from us in terms of grading pages for priority.

2. Metadata

We have been working on the page metadata. In particular we have minimised duplicate title and description tags, ensured all pages have title tags and thought a bit more about the content of the title and description tags – does the title properly represent the page? Is the description an effective summary of the content with important keywords? It is important to think about this from the perspective of the robots – what are the words that will be most useful for them, in terms of search engine searches?

For example, where we had a metadata title ‘Archives Hub: For Archivists’, we had a heading for the same page ‘Contributing to the Archives Hub’. Ideally these should be the same and we should decide which terms are most important – should ‘archivists’ be in the main heading? Should ‘contributing’ be in the title tag? We have also started to reverse our page titles so that the subject of the page is entered first of all, so not ‘Archives Hub: Contributors’ but ‘Contributors to the Archives Hub’.

3. Headings

As stated above, we are getting the metadata title and page title to correspond, and we are also thinking about the importance of the page headers for search engines. In the past we have had monthly features with titles like ‘Wabsters and Shewsters’. Whilst this might work as an intreguing title for a user, it will not help a user searching for Scottish textile history.

4. URLs

It is worth ensuring that at least one of the important keywords is in the URL for a page. So, a page on railway history should have a URL like http://www.archiveshub.ac.uk/railways.shtml and the title ‘Railway history: 200 years of the steam locomotive’.

5. Work on those keywords

We have worked on including keywords throughout the text, and especially in the first few lines. The inclusion of suggested websites and suggested reading provides a legitimate excuse to repeat keywords, both in their titles and in the annotations.

Other recommendations

There were other recommendations that we intend to implement over time, but did not have the resources to implement immediately – and some of them will more rationally fit into a redesign of our webiste (which is happening over the next 6 months).

1. Minimise use of tables

2. Change directory names to something more meaningful, e.g. ‘institution’ instead of ‘inst’, or ‘archivist’ instead of ‘arch’

3. Encourage external sites to link to the Hub site. This is an ongoing activity, but it should be easier with our new Website, and with our new approach to monthly features. We will also be able to link to Hub descriptions from sites like the National Register of Archives because we will have persistent URLs for all descriptions.

Web ranking reports

We have been working with Alan K’necht, an SEO expert, and Thierry Arsenault from the The Canadian Heritage Information Network (CHIN). Alan has provided us with weekly Web ranking reports. These reports are based upon some agreed search terms that we are using. We created three pages for three subject areas where the Archives Hub has strong collection representation: fairs and circus history, history of textiles and british railway history. For all of these subjcts we already had a monthly feature that we had created, so we could use the pages that already existed and just work on them to make them more optimised for search engines.

Conclusions so far

So, has it worked? If I take ‘fairground history’ as an example. On April 13th, this was at 30 in the Google rankings and at 14 in Google UK rankings. By May 11th it was at 11 in Google and 7 in Google UK. By June 6th it had moved to 6 in the rankings, and a quick search on Google UK now (17th June) puts it at number 3.

Railway history is maybe a more challenging topic, as we are competing with a huge amount of information. ‘Railway history UK’ was not ranked at the start of the project, but by 15th June it was at 15 in the rankings for Google, and at 11 for Google UK. A search on Google of just pages from the UK currently brings the page up to number 6 in the list.

Of course, the challenge with Google is to get the URL in the first page of results, and it is always a moveable feast, so if the page ranks highly one week, it may not do so the next. However, the work that we have done has clearly made an improvement to our rankings, and if we apply the lessons learnt to our other feature pages, we should be able to attract more people to the Archives Hub Website.

The principle of the JISC study was that ‘implementing a few simple and inexpensive Search Engine Optimisation (SEO) techniques can increase an organisation

International Archives Day 9th June 2009

Did you know that today is International Archives Day?

This is the 2nd International Archives Day ever held and 9th June was chosen because the International Council on Archives (ICA) was founded on 9th June 1948. Last year was the First International Archives Day, coinciding with the 60th Anniversary of ICA.

For more information about this and the history of ICA, go to the Unesco Archives website.


Over the last year the Archives Hub has had over 120,000 visits from over 184 countries. The map above gives an indication of international use.

One of our contributors, Glasgow University Archive Services, is celebrating International Archives Day by launching an online resource highlighting the international scope and reputation of Glasgow University and its archive collections.

The exhibition, searchable by region, will demonstrate the involvement of Scottish businesses on the development of the world economy and the influence that University of Glasgow and staff and students have had on the development of education around the world and on the history of many countries.

To go to the resource please see the following link: http://www.gla.ac.uk/services/archives/collections/internationalarchiveday/

If you are interested in international archives you could try the following websites and blogs:

Websites:
ArchiveGrid: A subscription site where you can find historical documents, personal papers, and family histories held in archives around the world.

European Archive: A freely available digital library of archives, with an emphasis on audio-visual materials.

MICHAEL UK: MICHAEL aims to provide simple and quick access to the digital collections of museums, libraries and archives from different European countries.

Unesco Archives Portal: a gateway to international archive collection websites

OCLC WorldCat (Manuscript materials): nearly 1.5 million catalogue records describing archival and manuscript collections and individual manuscripts in public, college and university, and special libraries located throughout North America and around the world.

Blogs:
Archiefforum.be: An online community which aims to support students and young archivists in their studies and profession by peer help and advise. (Flemish language)

ArchivesBlogs: a US blog which is a syndicated collection of blogs by and for archivists.

@rchivista: Spanish language blog written by Paco Fern

Inquire Within Upon Everything

I recently watched an episode of the Imagine… TV programme presented by Alan Yentob on the rise of the World Wide Web. One snippet that particularly delighted me was the reference to a Victorian book called ‘Inquire Within Upon Everything’. This is a compendium of advice on every conceivable subject, from alleviating aches and pains to social etiquette. It was the initial inspiration for Tim-Berners Lee when he was developing the software program that was the precursor to the Web – he named it ‘Inquire’ in homage to the book.

I like the idea that the World Wide Web was inspired by an obscure Victorian book. It gives a kind of sense of the continuation and spread of the world’s knowledge from a little-known book to the world wide scale of the Internet.

On a completely different note, I was chatting to Brian Kelly of UKOLN about the wonders of RSS and blogging. However, we agreed that setting up RSS feeds may not be for everyone. You now have an alternative – you can sign up to an RSS email service so that you receive regular emails instead of RSS feeds – read more about this on Brian’s blog – I believe that he’s testing a few out…