The Shape of Knowledge

row of booksIn the 1870s a young man from a small town in New York decided to organise the world’s knowledge. Well, at least the world’s knowledge in book form. The now ubiquitous  Dewey Decimal system divides knowledge decimally, as Dewey loved the decimal system. So, there are ten top-level classes with ten first-level sub-divisions (and so on). It’s a curious arrangement. Eight of the nine major divisions for religion are given over to Christianity. Dewey relegates Buddhism right down the ranks of its hierarchy, as a ‘religion of Indian origin’. It gives an entire category over to ‘Paranormal Phenomena’, and 999 is, rather satisfyingly, ‘extraterrestrial worlds’ (under 990, ‘General history of other areas’). When computing came along, there was no room for it left in the 600’s – Technology and Applied Sciences – so it went under the 000’s, which was originally for ‘generalities’.

“And there’s the weakness and the greatness of Dewey’s system. The…system lets patrons stroll through the collected works of What We Know – our collective memory palace – but the price for ordering knowledge in the physical world is having to make either-or decisions…The library’s geography of knowledge can have one shape but no other.” (Everything is Miscellaneous, David Weinberger)

The world of Dewey classification doesn’t reflect the way we see the world now because the shape of knowledge is fluid and ever-changing, and even then there were many who disputed his arangement. But it seems that for now we’re stuck with the basics of the Dewey system because the implications of changing it would be massive – libraries the world over have been physically ordered based on Dewey, and long decimal numbers have been painstakingly written on the spines of millions of books.

The Dewey system came to be as a result of the need to store one book in one place – knowledge has to be ordered when it is on shelves. Archives avoid this particular trap because they are not set out on shelves for people to browse, so they do not need a set physical order. The danger of archives being stereotyped as dusty boxes on shelves in dark rooms at least provided the advantage that they did not need to be ordered for browsing; the intellectual arrangement of archives has always been via the finding aids, so the physical collections did not need to undergo the either-or of arrangement in the way that libraries did.

Dewey relies upon giving a book a subject (although there can be cross-referencing to it of course). A book is not always easy to categorise under a subject; but an archive collection may be nigh on impossible to shoe-horn into one subject heading. If it’s hard enough to decide where to put a book about something like globalisation, trade and technology, for example, then it is almost an impossible task with archives because one collection is typically about a whole range of subjects, often ostensibly unrelated. And, of course, often archives are not consciously ‘about’ a subject, in so far as the subject is not central to the reason they were created. For example, a series of correspondence held in a Manchester archive might not be created to consciously describe or explain aspects of social housing developments in Manchester, but it might provide valuable evidence nonetheless; a letter might be written by someone moving into a new housing development, giving a great insight into how people felt about the large post-war housing estates, and what sort of changes it made to their lives. But the collection wouldn’t be ‘put under housing’ because it doesn’t need to be. It would really be impossible to physically put it together with other materials about the same subject because the correspondence might cover all manner of subjects – in a sense random subjects – if the writer is essentially communicating news and stuff that affects their life.

So, what are the  implications for archives cataloguing? How does ‘the geography of knowledge’ impact on archives? We haven’t got something like Dewey, we don’t have the problem of arranging physical things on shelves for people to browse. But do we still have a sense of ‘the right way’ to organise knowledge?

Well, we may not physically arrange archive collections on shelves, but we do approach dealing with each collection by the principles that we deem to be important – provenance and original order. Maybe we’re lucky that we have the principle of original order because it gives us a sensible, rational means to order a collection of sometimes very disparate materials (or you might say the idea is that the collection is already ordered for us). If we dispensed with original order, then we could come up with all sorts of other ways to order things but it is hard to see them making much sense. Weinberger’s book ‘Everything is Miscellaneous’, holds to the principle that in the digital age information wants to be free from all physical constraints, but I contend that original order provides a physical order that gives researchers an option – a way into the content should they choose to take it. I think ‘everything can be miscellaneous’ is more to the point. There are good reasons for imposing a physical order on an archive; but that shouldn’t mean that researchers are constrained as a result.

I think that what we need to be thinking about is enabling researchers to organise knowledge themselves – in a way that is relevant and useful for their own purposes. This potential for organisation is directly related to how we catalogue. Many people will search by subject, but when I look at the descriptions on the Archives Hub, I find many don’t have subject headings added to them. Subject headings offer significant advantages; they allow for the idea of different ways into a collection of information. They are like different pathways for researchers to take in order to get to the collection and connect it up with other collections.

When I search for ‘cooperative movements’ as a phrase on the Hub I get 40 hits. When I search for it as a subject I get 15 hits. If the system was working perfectly, I would deduce from this that there are 15 instances where ‘cooperative movement’ is a significant subject, and 25 more where it is relevant in some way – maybe it is referred to in passing, but the archive is not substantially concerned with this topic. However, it doesn’t really work like this because it is impossible to achieve that level of consistency in cataloguing. Different people catalogue differently. Some cataloguers put in more subjects, and some less; some maybe take more time to think about appropriate subjects, others just add a few very quickly; some don’t put any in at all, maybe believing that a free text search is enough. The end result of this is that searching becomes even more of a chance thing than it maybe needs to be. The irony for me, managing an aggregator, is that life would probably be a great deal easier if everyone catalogued in a superficial way…as long as it was consistent. As it is, you enter a subject term and you may still miss an archive of major importance. Enter a keyword (searching all the text) and you may not enter the same word(s) the cataologuer has used. There is, without doubt, an inevitable mis-match between what the cataloguer does and what the researcher needs in many cases.

It is a similar situation with the title of the material, which has become a vital way into collections now that so many people use general search engines. The title is what they see in a list of Google results. It needs to do its very best to reflect the content of the archive.  “Miscellany of eighteenth century poems by various authors” is pretty good, when you have something that is quite varied it pulls it together by what it is and when it was created. “Verse miscellany” is not so good, as it gives the researcher less to go on. “Poems” is pretty vague. A researcher on the Hub can look for ‘poems’ and then narrow the search down by other means, but when on Google these titles are not so useful. We try to keep the dates of creation with the title, as the two together provide a good deal more information. But a title can so often give a sense of the miscellaneous in archives; and it can be quite difficult to get round this with some of the more varied collections, which can sometimes be somewhat esoteric. Other titles just offer a personal or organisation name, which is fine when the researcher is in the reading room – they assume the name means that this is an archive about this person/organisation. Out of content a name is just a name and could mean absolutely anything.

Of course, we have to take a pragmatic approach, and there has been plenty written about this. Cataloguing will never ever be perfect: researchers will always have to seek in order to find. But we can probably do more to make things better, and we can try to understand more about the ways that people both look for something they want to find and search for what is out there (not knowing what they want to find).

I believe that it is worth putting a small amount extra thought into the words that are chosen when cataloguing, thinking about how each end-user will want to organise their own geography of knowledge.  A bit of thought about the key significant subjects is a good approach. This will help people, coming from different perspectives, and different search strategies, to discover archive collections.

We are still a long way from connecting things up in a way that researchers would like to see. The vision of Linked Data is to do just this. It offers a way to make connections across data sets. It opens up the idea of organising knowledge so that its never just one thing but a completely fluid landscape.  It’s not Melvil Dewey, looking at the world and giving us his version of how it should be organised; rather it is offering the chance to organise the world in an infinite number of ways. If others out there have resources on ‘The Fabian Society’ or ‘Beatrice Webb’ or ‘ the co-operative movement’ they can state that their concepts are the same as mine, and therefore my archive can be linked to these other resources.  This opens up data, enabling people to traverse data sets and bring resources together for their own ends.  For creating Linked Data, structured concepts, like subject headings, are a great  help, because they facilitate making these connections. Of course, there’s a bit more involved in Linked Data (including creating persistent URIs and actually matching up the same concepts), but the potential to link knowledge together in this large-scale way is immense.

Another means to encourage this fluidity is to allow end-users to add tags to content, so that we generate a mass of ways into the data. We really have to seriously consider this option for archival data, because it offers such significant advantages in terms of making things more discoverable. It is moving away from the idea that there is one way of doing things. It allows for things to be organised in an infinite variety of ways. Plenty of projects are now doing this, such as the zooniverse science projects, the Your Paintings project and the British Library georeferencing project for maps, but I’m not sure that we are really embracing it on a day-to-day level within archive catalogues.

lego brick

An archive can act like a lego set. As archivists we present the set as it was originally built, and we aim to keep this because it is evidence of its use. But we want, somehow, to label the whole, and to label parts of the whole, in such a way that researchers can take bits of them and use them to build other constructs; the difference now from 50 years ago is that we are more aware that we should not try to second-guess the constructs that people want to make, but we should catalogue to allow for infinite patterns.