Interoperability, data sharing and standards

I recently spoke at the CILIP MmIT group conference, where I inflicted EAD on a group of unsuspecting librarians. Not just EAD, but MARC and MODS XML and even some Linked Data. They may have said it was a bit like going back to library school, but no-one ran away.

I was talking to them about data sharing and interoperability, and asked them to look at resources described using different schema, to think about appropriateness: how well does the data format allow you to describe the resource? How machine-readable is it? How human-readable is it? How human/machine readable does it need to be? Is the format robust? Transformable? Sustainable? Interoperable?

These are all things you need to consider when you’re deciding which format to put your data in – except, of course, we often don’t think about these things much at all. These decisions might have been effectively made for you by the community. If all of your peer institutions use a certain data format, then you’re more likely to use it too. And if you want to share your data with the community, using the same format as they do is important.

But this means that you’re relying on other people to make these decisions about the best format for your data. Those people might know the sector and the issues involved in general, but they might not know your specific circumstances or users. Their decision might have been made a long time ago, before advances in theory and technology (MARC was first developed in the 1960s, and EAD in the 1990s). The choice of format might have been based on available tools, rather than underlying principles.

The same goes for cataloguing standards. Is sticking strictly to ISAD(G) really the best way to describe your collections to meet the needs of a global audience? (This is a topic that’s up for discussion at the Descriptive Standards Roundtable at the 2013 ARA Conference )

Of course, standards only work as standards if there’s sufficient community take-up, and a consensus on how to apply them.

XKCD on standards http://xkcd.com/927/

But progress isn’t made by blindly following rules, and ‘there’s already a standard for that’ is no reason not to think about whether there could be a better standard for it.

Standards should be developed from needs. What do people need to know? What do they need to be able to do with the data? What do we need to be able to tell them? And, if we’re looking to the future, what might they want to be able to do in the future? What do we need to do to the data now, to allow for future wants?

We can only work with what’s available, and it is important to have shared standards and points of reference. But if you don’t take time to consider these points when you’re choosing a standard, you’re not really choosing at all. You’re just perpetuating the status quo.

So take the time to think about what you’re doing with your data. Know why you’re using a particular standard, even if it’s because it’s the best of a bad bunch, or closest to what you want to do. Think about what it can and can’t do. Talk to others who are using it. Look for chances to comment on proposed revisions. The future of standards is the future of your data, and your data is valuable. Don’t let it decay.

2 Comments

  1. Thanks for this post. If we all just follow blindly along without understanding the affordances of our data formats, we aren’t really doing our job–or at least we aren’t doing it well.

    I liked how you focused on how our data formats should reflect what people need to know. Often in archives the people who need to know stuff are separated into two groups: archivists and researchers, and their goals don’t always align. For example a researcher interested in finding all collections that include correspondence from a particular individual doesn’t care so much about whether the finding aid is marked up in EAD and stored in an XML database. But an archivist might care a lot about how portable the data is across archival management systems.

    I think archivists often tend to focus on data standards absent the actual needs of researchers. Also, a particularly tricky occupational hazard is the somewhat irrational desire to predict future needs, which you highlight:

    And, if we’re looking to the future, what might they want to be able to do in the future? What do we need to do to the data now, to allow for future wants?

    While I think this is well intentioned, it’s very tricky to predict the future. Keeping the future in mind can definitely be useful, but focusing on the needs of the present is much more important. Determining what the needs of the present isn’t a walk in the park either! I guess I feel this personally because I’ve had to work with a data standard that included a lot of speculative features, that some years on, aren’t used at all, and make it hard to work with, and build systems around.

    1. Thanks Ed! I agree that it’s extremely difficult to predict what future needs might be. It might be better to say something like ‘try not to do things with your data today that will prevent/make it harder to do other things with it in future’ – although, again, it’s hard to tell exactly what consequences your data format choices might have. So I guess I’d advocate more for not building in speculative features themselves, but the capacity to speculate and experiment.

      This gap between researchers and archivists is something we hope that the Archives Hub can help bridge, by being at that intersection of researchers, archivists and technology. It’s certainly something we’re passionate about!

Comments are closed.