Open Up!

Speaking at the recent World Wide Web (WWW2012) conference in Lyon, Neelie Kroes, Vice President of the European Commission, emphasised the benefits of an open Web:  “With a truly open, universal platform, we can deliver choice and competition; innovation and opportunity; freedom and democratic accountability”.

Image: '774:Neuron Connection Pattern'

But what does ‘open’ really mean? Do we really have a clear idea about what it is, how we achieve it, and what the benefits might be? And do we have a sense of what is happening in this landscape and the implications for our own domain?

There is a tendency to think that the social web equates to some degree with the open web, but this is not the case. Social sites like Facebook often create walled gardens, and you could argue that whilst they are connecting people, they are working against the principle of free and linked information. Just because the aim is to enable people to be social and connected, that doesn’t mean that it is done in a way that ensures the data is open.

Another confusion may arise around open and its relationship to privacy. Again, things can be open, but privacy can be respected. It’s about clear and transparent choices – whether you want your data to be open or whether you want to include some restrictions. We are not always aware of what we are giving away:

“the Net enables individuals in many cases to compromise privacy more thoroughly than the government and commercial institutions traditionally targeted for scrutiny and regulation. The standard approaches that have been developed to analyze and limit institutional actors do not not work well for this new breed of problem, which goes far beyond the compromise of sensitive information.” (Jonathan Zittrain, The Future of the Internet).

There is an irony that we often readily give away personal data – so we can be (unknowingly?) very open with our own information – but often we give it away to companies that are not, in their turn, open with the data that is provided. As Tim Berners-Lee has stated, “”One of the issues of social networking silos is that they have the data and I don’t”.  It is possible to paint quite a worrying trend towards control being in the hands of the few:

“A small elite of gifted (largely male) super-wealthy masters of the universe are creating systems through their own brilliance which very few others in government, regulation or the general public can understand.” (

Google has often been seen as a good guy in this context, but recently a number of actions have indicated a move away from open web principles. We have started to be aware of the way Google collects and uses our data, and there have been some dubious practices around ways that data has been gathered (eg. personal data collected from UK Wifi networks). Google appear to be very eager to get us all using Google+, and we are starting to see ‘social’ search results that appear to be pushing people towards using Google+. I think we should be concerned by the announcement from Google that they are now starting to combine your Google search history with other data they collect through use of their products in order, they say, to deliver improved search results. Do we want Google to do this? Are we happy with our search history being used to push products. Is a ‘personalised’ service worth having if it means companies are collecting and holding all of the data we provide when we browse the web for their own commercial purposes?

It does seem that many of these companies will defend ‘open’ if it is in their interests, but find ways to bypass it when it is not. But maybe if we continue to apply pressure in support of an open approach, by implementing it ourselves and favouring tools and services that implement it,  we can tip the balance increasingly towards a genuine open approach that is balanced by safeguards for guaranteeing privacy.

It is worth thinking about some of these issues when we think about our own archival data and whether we are going to take an open approach. It is not just our own individual circumstances, within one repository, that we should think about, but the whole context of what we want the Web to be, because our choices help determine the direction that the Internet goes in.

We do not know how our archival data will be used and we never have. Even when the data was on paper, we couldn’t control use. Fundamentally, we need to see this as a good thing, not a threat, but we need to be aware of the risks. The paradigm has shifted because the data is much much easier to mess with now that it is digital, and, of course, we have digital archives, and we are concerned to protect the integrity of these sources. Providing an API interface seems like a much more explicit invitation to play with the data than simply putting it on the Web. Furthermore, the mindset of the digital generation is entirely different. Cutting and pasting, mashing and inter-linking are all taken for granted and sometimes the idea of the provenance of data sources seems to go out of the window.

This issue of ‘unpredictable use’ is interesting. When the Apple II was launched, Steve Jobs did not know how it would be used – as Zittrain states, it ‘invited people to tinker with it’, and whilst Apple might have had ideas about use, there ideas were not a constraint to reality. Contrast this with the iPhone: it comes out of the box programmed. Quoting Zittrain again: ‘Whereas the world would innovate for the Apple II, only Apple would innovate for the iPhone.’  Zittrain sees this trend towards fixed appliances as pointing to a rather bleak future of ‘sterile appliances tethered to a network of control.’

Maybe the challenge is partly around our understanding of the data we are ‘giving away’, why we are giving it away, and if it’s worth giving away because of what we get in return. Think about the data gathered as we browse the Web – cookies have always had a major role in building up a profile of our Web behaviour – where we go and what we do. Cookies are generally interested in behaviour rather than identity. But still, from 26 May this year, every website operating in the UK will be required to inform its users that they are being tracked with cookies, and to ask users for their consent. This means justifying why cookies are required – it may be for very laudable reasons – logging users’ preferences to help improve their experience, and generally getting a better understanding of how people use the site in order to improve it. It may be in order to target advertising – and you may feel that its hardly a problem if this is adversely affected, but maybe it will be smaller, potentially very useful sites that will suffer if their advertising revenue is affected. Is it a good think that sites will have to justify use of cookies? Is it in our interests? Is this really where the main fight is around protecting your identity?  At the same time, the Government is proposing to bring in powers to monitor the websites people are looking at, something which has very profound implications for our privacy.

On the side of the good guys, there are plenty of people out there espousing the importance of an altruistic approach. In The Digital Public Domain: Foundations for an Open Culture the authors argue that the Public Domain — that is, the informational works owned by all of us, be that literature, music, the output of scientific research, educational material or public sector information — is fundamental to a healthy society. Many other publications echo this belief and you can see plenty of evidence of the movement towards open publication. Technology can increasingly help us realise the advantages of an open approach. For example, text mining is something that can potentially bring substantial economic and cultural benefits. It is often associated with biomedical sciences, but it could have great potential across the humanities. A recent JISC report on the Value and Benefits of Text Mining concluded that the current licensing arrangements are a key barrier to unlocking the potential value of text mining to society and enabling researchers to gain maximum benefit from the data. Text mining  analyses text using computational methods, and provides the end user with what is most relevant to them. But copyright law works against this principle:

“Current copyright law, however, imposes serious restrictions on text mining, because it involves a range of computerised analytical processes that are not all readily permitted within UK intellectual property law. In order to be ‘mined’, text must be accessed, copied, analysed, annotated and related to existing information and understanding. Even if the user has access rights to the material, making annotated copies can currently be illegal without the permission of the copyright holder.” (JISC report: Value and Benefits of Text Mining)

It is important to think beyond types of data and formats, and engage with the debate about the sort of advances for researchers that we can gain by opening up all types of data; whilst the data may differ, many of the arguments about the benefits of open data apply across the board, in terms of encouraging innovation and the advancement of knowledge.

The battle between open and transparent and the walled garden scenario is played out in many ways. One area is the growth of native apps for mobile devices. It is easy to simply see mobile apps as providing new functionality, and opportunities for innovation, with so many people creating apps for every activity you can think of. But what are the implications of a scenario where you have to enter a ‘silo’ environment in order to do anything, from finding your way to a location to converting measurements from one unit to another, to reading the local news reports? Tim Berners-Lee sees this as one of the main threats to the principle of the Web as a platform with the potential to connect data.

Should we be promoting the open Web? Surely as information professionals we should. This does not mean that there is an imperative to make everything open – we are all aware of the Data Protection Act and there are a whole range of very valid reasons to withhold information. But it is increasingly apparent that we need to be explicit in applying permissions – whether it be explicitly fully open licences, attribution licences or closure periods. Many of the archives described on the Archives Hub have ‘open for consultation’ under Access Conditions. But what does this really mean? What can people really do with the content? Is the content actually ‘open for reuse in any way the end-user deems useful to them’? And specifically, what can they do with digital content, available on the World Wide Web?

We need to be aware of the open data agenda and the threats and opportunities presented to us. We have the same kind of tensions within our own world in terms of what we need to protect, both for preservation reasons and for reasons of IPR and commercial advantage, but we want to make archives as open and available as possible and we need to understand what this means in the context of the digital, online world. The question is, whether we want to encourage innovation, what Zittrain calls a ‘generative’ approach. This means enabling programmers to take hold of what you can provide and working with it, as well as encouraging reuse by end users. But it also means taking a risk with security, because making things open inevitably makes them more open to abuse.

People’s experiences with the Internet are shaped by the devices they use to access it. And, similarly, their experiences of archives are shaped by the ways that we choose to enable access. In the same way that a revolution started when computers were made available that could potentially run software that was not available at the time they were built, so we should think about enabling innovation with our data that we may not envisage at present, not trying to determine user behaviour but relishing the idea of the unknown. This raises the question of whether we facilitate this by simply putting the data out as it is, even if it is somewhat unstructured, making it as open as possible, or whether we should try to clean up data, and make it more rigorous, more consistent and more structured, with the purpose of facilitating flexible use of the data, enabling people to use and present it in other ways. But maybe that is another story…

In a way, the early development of the Internet has been an unexpected journey. Innovation has been encouraged because a group of computer scientists did not want to control or to ensure maximum commercial success. It was an altruistic approach; and it may not have turned out that way. Maybe the PC was an unlikely success story in business, where a stable, supported, predictable solution might have been favoured, where companies did not need skills in-house.  Both Microsoft and Apple made their computers’ operating systems open for third-party software development, so even if there was a sense of monopoly, there was still a basic ethos of anyone being able to write for the PC.

Are we still moving down that road where amateur innovation and ‘messing about’ is seen as a positive thing? Or are the problems of security, spam and viruses, along with the power of the big guys, taking us towards something that is far more fixed and controlled?