Digital Curation: think use, not preservation

For the keynote presentation at the DCC/RIN Research Data Management Forum on ‘The Economics of Applying and Sustaining Digital Curation’, Chris Rusbridge gave us some reflections from the Blue Ribbon Task Force (BRTF): http://brtf.sdsc.edu/about.html on Sustainable Digital Preservation and Access. This was a 2 year project, finishing earlier this year, and the final report is available from: http://brtf.sdsc.edu/biblio/BRTF_Final_Report.pdfpicture of digital data

Chris kicked off by asking us to think about how we currently support access to digital information. Avenues include Government grants, advertisements (e.g. through Google), subscriptions (to journals), pay per service (e.g. Amazon Web service), and donations.

One of the key themes that he raised and returned to was around the alignment, or lack of alignment between those who pay, those who provide and those who benefit from digital data: they are not necessarily the same, and the more different they are the harder it may be to create a sustainable model . Who owns, who benefits, who selects, who preserves, who pays?  This has interesting parallels with archive repositories, where an institution may pay for the acquisition, appraisal, storage, cataloguing and access for these resources, but the beneficiaries are far broader than just members of the institution. Some institutions may require payment for access, but others will provide access free of charge. They may see this as a means to enhance their reputation and status as a learned society.

Around 15 years ago we started to think about digital preservation as a technical problem and then the OAIS reference model was produced. The technical capabilities that we now have are well up to the task, although Chris warned that the most elegant technical solution is no good if it is not sustainable; digital preservation has to be a sustainable economic activity. Today the focus is on the economic and organisational problems. It is not just about money; it requires building upon a value proposition, providing incentives to act and defining roles and responsibilities.

Digital preservation represents a derived demand.  No one ‘wants’ preservation per se; what they want is access to a resource.  It is not easy to sell a derived demand – often it needs to be sold on some other  basis. This idea of selling the importance of providing use (over time) rather than trying to sell the idea of preservation was emphasised throughout the Forum.

Digital preservation is also ‘path dependent’, meaning that the actions and decision you take change over time; they are different at different points of the life-cycle. Today’s actions can remove other options for all time.

Cultural issues, and mindset may be an issue here, and I was interested in the potential problem Chris proposed of  the ‘free-rider’ culture when it comes to making research datasets available. It may be that some (many?) researchers don’t want to pay for things, under value services and maybe underestimate costs. Researchers may also resent conformity and what they see as beauracracy. All in all, it may be difficult to make a case that researchers should in some way pay. This may be compounded by a sense that money invested in preservation is money taken out of research.  Chris suggested that the incentives for preservation are less apparent to the individual researcher, but are more clearly defined when the data is aggregated.

Typically, long-term preservation activities  have been funded by short-term resource allocation, although maybe this is gradually changing; a more thorny issue is that of recognising and valuing the benefits of digital preservation, to provide incentives that attract funding. More work needs to be done on articulating the benefits in order to cultivate a sense of the value.However, other speakers at the Forum wondered whether we should actually take the value as a given – maybe we shouldn’t keep asking the question about benefits, but simply acknowledge that it is the right thing to make research and other digital outputs available long-term?  We may be creating problems for ourselves if we emphasise the need to demonstrate value too much, and then struggle to quantify the value. However, this was just one argument, and overall I think that there was a belief that we do need to understand and articulate the benefits of providing long-term access.

There is often a lack of clear responsibility around digital preservation – maybe this is one of those areas where it’s always thought to be someone else’s responsibility? So, appropriate organisation and governance is essential for efficient ongoing preservation, especially when considering the tendency for data to be transferred – these ‘handoffs’ need to be secure.

The three imperatives that the BRTF report comes up with are: to articulate a compelling value proposition; to provide clear incentives to preserve in the public interest; to define role and responsibilities.

Commenting briefly on the post BRTF developments, Chris mentioned the EU digital agenda and the  LIBER pan-european survey on sustainability preparedness.

There are some mandates emerging:  the NERC and ESRC, for example.  Some publishers do require authors to make available data that substantiates an article, but at present this is not rigorous enough. We need to focus more on the data behind the research and how important it is.

Chris contrasted domain data repositories and institutional data repositories. Domain data repositories: leverage scale and expertise; are valuable for ‘high curation’ data; can carry out a ‘community proxy’ role such as tool development; aggregate demand; are potentially vulnerable to policy change (e.g. AHDS). A mixed funding models desirable for domain data repositories (e.g. ICPSR). Institutional data repositories: have a reputational business case (risk management, records management aspects, showcasing); should be aligned with institutional goals; can link to institutional research services (e.g. universal backup); can work well for ‘low curation’ cases (relatively small, static datasets); demand aggregation across a set of disciplines.

One issue that came up in the discussion was that we must remember that in fact digital preservation is relatively cheap, especially when compared to the preservation of hard-copy archives, held in acid-free boxes on rows and rows of shelving in secure, controlled search rooms.  So, if the cost is actually not prohibitive, and the technical know-how is there, then it seems imperative to address the organisational issues and to really hammer home the true value of preserving our digital data.