Digital Curation: think use, not preservation

For the keynote presentation at the DCC/RIN Research Data Management Forum on ‘The Economics of Applying and Sustaining Digital Curation’, Chris Rusbridge gave us some reflections from the Blue Ribbon Task Force (BRTF): http://brtf.sdsc.edu/about.html on Sustainable Digital Preservation and Access. This was a 2 year project, finishing earlier this year, and the final report is available from: http://brtf.sdsc.edu/biblio/BRTF_Final_Report.pdfpicture of digital data

Chris kicked off by asking us to think about how we currently support access to digital information. Avenues include Government grants, advertisements (e.g. through Google), subscriptions (to journals), pay per service (e.g. Amazon Web service), and donations.

One of the key themes that he raised and returned to was around the alignment, or lack of alignment between those who pay, those who provide and those who benefit from digital data: they are not necessarily the same, and the more different they are the harder it may be to create a sustainable model . Who owns, who benefits, who selects, who preserves, who pays?  This has interesting parallels with archive repositories, where an institution may pay for the acquisition, appraisal, storage, cataloguing and access for these resources, but the beneficiaries are far broader than just members of the institution. Some institutions may require payment for access, but others will provide access free of charge. They may see this as a means to enhance their reputation and status as a learned society.

Around 15 years ago we started to think about digital preservation as a technical problem and then the OAIS reference model was produced. The technical capabilities that we now have are well up to the task, although Chris warned that the most elegant technical solution is no good if it is not sustainable; digital preservation has to be a sustainable economic activity. Today the focus is on the economic and organisational problems. It is not just about money; it requires building upon a value proposition, providing incentives to act and defining roles and responsibilities.

Digital preservation represents a derived demand.  No one ‘wants’ preservation per se; what they want is access to a resource.  It is not easy to sell a derived demand – often it needs to be sold on some other  basis. This idea of selling the importance of providing use (over time) rather than trying to sell the idea of preservation was emphasised throughout the Forum.

Digital preservation is also ‘path dependent’, meaning that the actions and decision you take change over time; they are different at different points of the life-cycle. Today’s actions can remove other options for all time.

Cultural issues, and mindset may be an issue here, and I was interested in the potential problem Chris proposed of  the ‘free-rider’ culture when it comes to making research datasets available. It may be that some (many?) researchers don’t want to pay for things, under value services and maybe underestimate costs. Researchers may also resent conformity and what they see as beauracracy. All in all, it may be difficult to make a case that researchers should in some way pay. This may be compounded by a sense that money invested in preservation is money taken out of research.  Chris suggested that the incentives for preservation are less apparent to the individual researcher, but are more clearly defined when the data is aggregated.

Typically, long-term preservation activities  have been funded by short-term resource allocation, although maybe this is gradually changing; a more thorny issue is that of recognising and valuing the benefits of digital preservation, to provide incentives that attract funding. More work needs to be done on articulating the benefits in order to cultivate a sense of the value.However, other speakers at the Forum wondered whether we should actually take the value as a given – maybe we shouldn’t keep asking the question about benefits, but simply acknowledge that it is the right thing to make research and other digital outputs available long-term?  We may be creating problems for ourselves if we emphasise the need to demonstrate value too much, and then struggle to quantify the value. However, this was just one argument, and overall I think that there was a belief that we do need to understand and articulate the benefits of providing long-term access.

There is often a lack of clear responsibility around digital preservation – maybe this is one of those areas where it’s always thought to be someone else’s responsibility? So, appropriate organisation and governance is essential for efficient ongoing preservation, especially when considering the tendency for data to be transferred – these ‘handoffs’ need to be secure.

The three imperatives that the BRTF report comes up with are: to articulate a compelling value proposition; to provide clear incentives to preserve in the public interest; to define role and responsibilities.

Commenting briefly on the post BRTF developments, Chris mentioned the EU digital agenda and the  LIBER pan-european survey on sustainability preparedness.

There are some mandates emerging:  the NERC and ESRC, for example.  Some publishers do require authors to make available data that substantiates an article, but at present this is not rigorous enough. We need to focus more on the data behind the research and how important it is.

Chris contrasted domain data repositories and institutional data repositories. Domain data repositories: leverage scale and expertise; are valuable for ‘high curation’ data; can carry out a ‘community proxy’ role such as tool development; aggregate demand; are potentially vulnerable to policy change (e.g. AHDS). A mixed funding models desirable for domain data repositories (e.g. ICPSR). Institutional data repositories: have a reputational business case (risk management, records management aspects, showcasing); should be aligned with institutional goals; can link to institutional research services (e.g. universal backup); can work well for ‘low curation’ cases (relatively small, static datasets); demand aggregation across a set of disciplines.

One issue that came up in the discussion was that we must remember that in fact digital preservation is relatively cheap, especially when compared to the preservation of hard-copy archives, held in acid-free boxes on rows and rows of shelving in secure, controlled search rooms.  So, if the cost is actually not prohibitive, and the technical know-how is there, then it seems imperative to address the organisational issues and to really hammer home the true value of preserving our digital data.

Democracy 2.0 in the US

Democracy 2.0: A Case Study in Open Government from across the pond.

I have just listened to a presentation by David Ferriero – 10th Archivist of the US at the National Archives and Records Administration (www.archives.gov). He was talking about democracy, about being open and participatory. He contrasted the very early days of American independence, where there was a high level of secrecy in Government, to the current climate, where those who make decisions are not isolated from the citizens, and citizens’ voices can be heard. He referred to this as ‘Democracy 2.0.’ Barack Obama set out his open government directive right from the off, promoting the principles of more transparecy, participation and collaboration. Ferriero talked about seeking to inform, educate and maybe even entertain citizens.

The backbone of open government must be good record keeping. Records document individual rights and entitlements, record actions of government and who is responsible and accountable. They give us the history of the national experience. Only 2-3 percent of records created in conducting the public’s business are considered to be of permanent value and therefore kept in the US archives (still, obviously, a mind-bogglingly huge amount of stuff).

Ferriero emphasised the need to ensure that Federal records of historical value are in good order. But there are still too many records are at risk of damange or loss. A recent review of record keeping in Federal Agencies showed that 4 out of 5 agencies are at high or moderate risk of improper destruction of records. Cost effective IT solutions are required to address this, and NARA is looking to lead in this area. An electronic records archive (ERA) is being build in partnership with the private sector to hold all the Federal Government’s electronic records, and Ferriero sees this as the priority and the most important challenge for the National Archives. He felt that new kinds of records create new challenges, that is, records created as result of social media, and an ERA needs to be able to take care of these types of records.

Change in processes and change in culture is required to meet the new online landscape. The whole commerce of information has changed permanently and we need to be good stewards of the new dynamic. There needs to be better engagement with employees and with the public. NARA are looking to improve their online capabilities to improve the delivery of records. They are developing their catalogue into a social catalogue that allows users to contribute and using Web 2.0 tools to allow greater communication between staff. They are also going beyond their own website to reach users where they are, using YouTube, Twitter, blogs, etc. They intend to develop comprehensive social media strategy (which will be well worth reading if it does emerge).

The US Government are publishing high value datasets on data.gov and Ferriero said that they are eager to see the response to this, in terms of the innovative use of data. They are searching for ways to step of digitisation – looking at what to prioritise and how to accomplish the most with least cost. They want to provide open government leadership to Federal Agencies, for example, mediating in disputes relating to FoI. There are around 2,000 different security classification guides in the government, which makes record processing very comlex. There is a big backlog of documents waiting to be declassified, some pertaining to World War Two, the Koeran War and the Vietnam War, so they will be of great interest to researchers.

Ferriero also talked about the challenge of making the distiction between business records and personal records. He felt that the personal has to be there, within the archive, to help future researchers recreate the full picture of events.

There is still a problem with Government Agencies all doing their own thing. The Chief Information officers of all agencies have a Council (the CIO Council). The records managers have the Records Management Council. But it is a case of never the twain shall meet at the moment. Even within Agencies the two often have nothing to do with eachother….there are now plans to address this!

This was a presentation that ticked many of the boxes of concern – the importance of addressing electronic records, new media, bringing people together to create efficiencies and engaging the citizens. But then, of course,  it’s easy to do that in words….