Assessing Machine Learning Outputs

June 29, 2022 / Jane Stevenson

One of the challenges that we face with our Labs project is presentation of the Machine Learning results. We thought there would be many out of the box tools to help with this, but we have not found this to be the case.

If we use the AWS console Rekognition service interface for example, we get presented with results, but they are not provided in a way that will readily allow us and our project participants to assess them. Here is a screenshot of an image from Cardiff University – an example of out of the box use of AWS Rekognition:

*Excavation at Stonehenge, Cardiff University Photographic Archive*

This is just one result – but we want to present the results from a large collection of images. Ideally we would run the image recognition on all of the Cardiff images, and/or on the images from one collection, assess the results within the project team and also present them back to our colleagues at Cardiff.

The ML results are actually presented in JSON:

Here you can see some of the terms identified and the confidence scores.

These particular images, from the University archive, are catalogued to item level. That means they may not benefit so much from adding tags or identifying objects. But they are unlikely to have all the terms (or ‘labels’ in ML parlance) that the Rekognition service comes up with. Sometimes the things identified are not what a cataloguer would necessarily think to add to a description. The above image is identified as ‘outdoors’, ‘ground’ and ‘soil. These terms could be useful for a researcher. Just identifying photographs with people in them could potentially be useful.

Another example below is of a printed item – a poem.

*Up in the Wind, Papers of Edward Thomas, Cardiff University*

Strange formatting of the transcript aside, the JSON below shows the detected text (squirrels), confidence and area of the image where the word is located.

If this was provided to the end user, then anyone interested in squirrels in literature (surely there must be someone…) can find this digital content.

But we have to figure out how to present results and what functionality is required. It reminds me of using Open Refine to assess person name matches. The interface provides for a human eye to assess and confirm or reject the results.

*Screenshot of names matching using Open Refine*

We want to be able to lead discussions with our contributors on the usefulness, accuracy, bias – lack of bias – and peculiarities of machine learning, and for that a usable interface is essential.

How we might knit this in with the Hub description is something to consider down the line. The first question is whether to use the results of ML at all. However, it is hard to imagine that it won’t play a part as it gets better at recognition and classification. Archvists often talk about how they don’t have time to catalogue. So it is arguable that machine learning, even if the results are not perfect, will be an improvement on the backlogs that we currently have.

AWS Rekognition tools

We have thought about which tools we would like to use and we are currently creating a spreadsheet of the images we have from our participants and which tools to use with each group of images.

Some tools may seem less likely, for example, image moderation. But with the focus on ethics and sensitive data, this could be useful for identifying potentially offensive or controversial images.

The Image Moderation tool recognises nudity in the above image.

confidence scores for nudity — *The confidence scores are high that this image represents nudity*

This could be carried through to the end user interface, and a user could click on ‘view content’ if they chose to do so.

image of nude — *Art Design and Architecture Collection, Glasgow School of Art (NMC/1137)*

The image moderation tool may classify images art images as sensitive when they are very unlikely to cause offence. The tools may not be able to distinguish offensive nudity from classical art nudity. With training it is likely to improve, but when you think about it, it is not always an easy line for a human to draw.

Face comparison could potentially be useful where you want to identify individuals and instances of them within a large collection of photographs for example, so we might try that out.

However, we have decided that we won’t be using ‘celebrity recognition’, or ‘PPE detection’ for this particular project!

Text and Images

We are particularly interested in text and in text within images. It might be a way to connect images, and we might be able to pull the text out to be used for searching.

Suffice to say that text will be very variable. We ran Transkribus Lite on some materials.

Transkribus on a handwritten letter — *Letter from the Papers of Edward Thomas at Cardiff University*

We compared this to use of AWS Text Rekognition.

These examples illustrate the problem with handwritten documents. Potentially the model could be trained to work better for handwriting, but this may require a very large amount of input data given the variability of writing styles.

Transkribus on a typescript letter — *Poem from the Papers of Edward Thomas, Cardiff University*

Transkribus has transcribed this short typescript text from the same archive well. One word ‘house’ has been transcribed as ‘housd’ and ‘idea’ caused a formatting issue, but overall a good result.

Transkribus on a poster — *Poster from the Design Archive, University of Brighton Design Archives*

The above example is Transkribus Lite on a poster from the University of Brighton Design Archives. In archives, many digital items are images with text – particularly collections of posters or flyers. Transkribus has not done well with this (though this is just using the Lite version out of the box).

We also tried this with the AWS Rekognition Text tool, and it worked well.

Another example of images with text is maps and plans.

Lambeth Palace map of London — *19^th century map of Clerkenwell, Lambeth Palace Archive*

Above are two examples of places identified from the plan output in JSON. If we can take these outputs and add them to our search interface, an end user could search for ‘clerkenwell’ or ‘northampton square’ and find this plan.

Questions we currently have:

How do we present the results back to the project team?
How do we present the results to the participants?
Do we ask participants specific questions in order to get structured feedback?
Will we get text that is useful enough to go to the next step?
Which images provide good text and which don’t?
How might they results be used on the Archives Hub to help with discovery?

As we progress the work, we will start to think about organising a workshop for participants to get their feedback on the ML outputs.

Thanks to Adrian Stevenson, one of the Hub Labs team, who took me through the technical processes outlined in this post.

Using AI to Write Blog Posts

May 27, 2022 / Adrian Stevenson

Given that Hub Labs is looking at AI and we’re also writing blog posts, it seemed like a good opportunity to try out one of the AI blog post writing applications that keep appearing in my various streams. The one that seems to get the best recommendations and is also free to trial is Writesonic. I’ve been meaning to write a post about our initial steps to build the AWS cloud infrastructure for Labs, so I thought I would use this as the basis for the experiment. You essentially provide a title, in this case I chose ‘Getting Started with Machine Learning in the Cloud’, and it will then provide some intros and outlines that you can accept or reject before it goes on to produce some copy. The best copy I could get it to produce given the options provided is pasted in below. You can see it’s a bit marketing speak and it was no great surprise that it’s not proved to be of much use for my experiment at least. For some reason it wanted to assume we are using Google Cloud and didn’t offer any AWS copy. It was interesting to try it though, and I could see it might be useful for more general posts.

The AI generated copy (italised by me) including the image is pasted below:

Getting Started with Machine Learning in the Cloud

Image Source: Freepik‍

The practice of machine learning involves training computer algorithms to recognize patterns in data and make predictions based on those patterns. It takes a lot of resources to implement machine learning in your business. There are many different datasets, tools, and software packages that you need to understand and choose from before you canput them to work analyzing your data and building your algorithm. This blog post discusses how you can use cloud services for machine learning with an example using Google Cloud Machine Learning (Cloud ML) and its APIs for Natural Language Processing (NLP).

What is Machine Learning?

There are many definitions of machine learning, including the IEEE’s “A computer program is said to learn from experience E with respect to some class of tasks T and some performance measure P if its performance at tasks in T, as measured by P, improves with experience E.” Or, as silly as it sounds, you can think about machine learning as a computer program that takes a bunch of data and spits out a prediction. You’re the one that has to tell the program how to do it.

Cloud ML with its APIs for NLP

Google Cloud Machine Learning workflows let you build and manage machine learning models in Google Cloud. Cloud ML supports a variety of machine learning frameworks, including TensorFlow, Caffe 2, and Apache MXNet. Cloud ML also provides a set of Natural Language APIs to support natural language processing (NLP) tasks, including sentiment analysis and entity extraction. Cloud ML’s NLP APIs can be accessed programmatically through its REST APIs.

Create a dataset using BigQuery

When you’re collecting data for machine learning, it’s important that your data be as accurate as possible. Google’s BigQuery lets you create a dataset in the cloud, and then you can use Google’s Natural Language API to analyze that dataset and get a breakdown of the different parts of speech. BigQuery is Google’s fully managed, enterprise data warehouse for business analytics. With BigQuery, you can query large data sets using standard SQL and get results in seconds. It’s also fully integrated with Google’s other analytical and collaboration tools, such as Google Sheets, Google Data Studio, and Google Cloud Platform.

Create a model using TensorFlow

With your dataset collected and a model for the Natural Language API built, you can create the model with TensorFlow. In this example, you’ll be building a model for sentiment analysis, which means you’ll be using the API’s sentiment analysis functionality. If you’re new to TensorFlow, don’t worry. There are some libraries out there that make it super easy to get started. One of them is tf-python-client, which is a library that allows you to write python code that uses TensorFlow.

Run the model in a Cloud Dataflow job

At this point, you’ve built a model and have everything ready to go. All that’s left to do is plug everything into Cloud Dataflow. Cloud Dataflow is Google’s fully managed, distributed framework for executing streaming data pipelines. Cloud Dataflow lets you build scalable, easy-to-manage data processing pipelines that transform and enrich your data. With Dataflow, you can build data processing and machine learning pipelines in a single, unified programming model. This means that you can build a model that performs data analysis, trains the model, and then deploys it in one workflow instead of three different ones.

Run the model in an API server

Your model is ready to go. Now you’re just missing the code to plug into an API server that can handle your data and send it to BigQuery. The Google Cloud Natural Language API lets you understand the meaning of text. You can use the API to detect entities, sentiments, and other linguistic properties of text. The API supports 82 languages.

Conclusion

Now that you know what machine learning is and have gotten started with Cloud ML and its NLP APIs, you can put this knowledge to use. You can use the NLP APIs to perform sentiment analysis or entity extraction on a dataset. This can be helpful for creating articles, social media posts, or marketing materials that are more effective.

With Cloud ML and its NLP APIs, you can create machine learning models that are more accurate and make more informed predictions. It takes a lot more effort to create models without cloud services, and it’s often difficult to keep track of all of the resources needed. With cloud services, you can focus on the model and not worry about the infrastructure and resources needed to run it.

Launch of Towards a National Collection discovery projects

September 21, 2021 / Jane Ronson

£14.5m awarded to transform online exploration of UK’s culture and heritage collections through harnessing innovative AI

The Arts and Humanities Research Council (AHRC) has awarded £14.5m to the research and development of emerging technologies, including machine learning and citizen-led archiving, in order to connect the UK’s cultural artefacts and historical archives in new and transformative ways.

Image by Colin McDowall, courtesy of Towards a National Collection. (Young woman winding bobbins on wheel in the loom shop, 1898 Blanket factory, Witney, Oxfordshire © Historic England Archive CC73_00946 | Indian laundry couple with the man ironing clothes. Attributed to a painter from Tanjore (Thanjavur), ca. 1840. Gouache drawing. 32247i © Wellcome Collection | Sir Hans Sloane (1660–1753) Stephen Slaughter (1697–1765) (attributed to) © The Trustees of the Natural History Museum, London | A starboard bow view of the three-masted barque Glenbervie (1866) with crowds of people, on the rocks at Lowland Point. G14146. © National Maritime Museum, Greenwich, London, Gibson’s of Scilly Shipwreck Collection | Artwork by Peter Morphew illustrating the repositories of the University of Glasgow Archives and Special Collections.)

The Archives Hub is pleased to announce that we will be a project partner in one of five major projects being launched today. The projects form the largest investment of Towards a National Collection, a five-year research programme. Today’s launch reveals the first insights into how thousands of disparate collections could be explored by public audiences and academic researchers in the future.

The five ‘Discovery Projects’ will harness the potential of new technology to dissolve barriers between collections – opening up public access and facilitating research across a range of sources and stories held in different physical locations. One of the central aims is to empower and diversify audiences by involving them in the research and creating new ways for them to access and interact with collections. In addition to innovative online access, the projects will generate artist commissions, community fellowships, computer simulations, and travelling exhibitions. The projects are:

● The Congruence Engine: Digital Tools for New Collections-Based Industrial Histories

● Our Heritage, Our Stories: Linking and searching community-generated digital content to develop the people’s national collection

● Transforming Collections: Reimagining Art, Nation and Heritage

● The Sloane Lab: Looking back to build future shared collections

● Unpath’d Waters: Marine and Maritime Collections in the UK

The investigation is the largest of its kind to be undertaken to date, anywhere in the world. It extends across the UK, involving 15 universities and 63 heritage collections and institutions of different scales, with over 120 individual researchers and collaborators.

Together, the Discovery Projects represent a vital step in the UK’s ambition to maintain leadership in cross-disciplinary research, both between different humanities disciplines and between the humanities and other fields. Towards a National Collection will set a global standard for other countries building their own collections, enhancing collaboration between the UK’s renowned heritage and national collections worldwide.

Archives Hub and the Transforming Collections: Reimagining Art, Nation and Heritage project

Donald Locke 1972-4, Trophies of Empire © Estate of Donald Locke Courtesy of Tate | Claudette Johnson, Figure in Blue, 2018. © Claudette Johnson. Image Credit: Arts Council Collection, Southbank Centre | Iniva_Rivington Place: Photograph by Carlos Jimenez, 2018 | Rachel Jones, lick your teeth, they so clutch, 2021. Arts Council Collection, Southbank Centre, London © the artist. Image courtesy of the artist and Thaddaeus Ropac, London.

The Archives Hub at Jisc will be working with fellow project partners:

***susan pui san lok, 2021: Courtesy the artist***

Tate
Arts Council Collection
Art Fund
Art UK
Birmingham Museums Trust
British Council Collection
Contemporary Art Society
Glasgow Museums
Iniva (Institute of International Visual Art)
Manchester Art Gallery
Middlesbrough Institute of Modern Art
National Museums Liverpool
Van Abbemuseum (NL)
Wellcome Collection

The Principal investigator for Transforming Collections: Reimagining Art, Nation and Heritage project is Professor susan pui san lok, University of the Arts London.

More than twenty years after Stuart Hall posed the question, ‘Whose heritage?’, Hall’s call for the critical transformation and reimagining of heritage and nation remains as urgent as ever. This project is driven by the provocation that a national collection cannot be imagined without addressing structural inequalities in the arts, engaging debates around contested heritage, and revealing contentious histories imbued in objects.

***An arrangement of different castes including snake charmer, brick-layer, basket-maker, potter and wives. Gouache drawing. 28438i*** ***© Wellcome Collection.***

Transforming Collections aims to enable cross-search of collections, surface patterns of bias, uncover hidden connections, and open up new interpretative frames and ‘potential histories’ (Azoulay, 2019) of art, nation and heritage. It will combine critical art historical and museological research with participatory machine learning design, and embed creative activations of interactive machine learning in the form of artist commissions.

***Untitled 1986 1987.21, Manchester Art Gallery © Keith Piper.***

Among the aims of this project are to surface suppressed histories, amplify marginalized voices, and re-evaluate artists and artworks ignored or side-lined by dominant narratives; and to begin to imagine a distributed yet connected evolving ‘national collection’ that builds on and enriches existing knowledge, with multiple and multivocal narratives.

The role of the Archives Hub will centre around:

Disseminating project aims, developments and outcomes to our contributors, through our communication channels and our cataloguing workshops, to encourage a wide range of archives to engage with these issues.

***Glasgow Women’s Library, Museum of the Year finalist, 2018. Art Map 2019. © Marc Atkins / Art Fund 2018***

Working with the Creative Computing Institute, at the University of the Arts London, to integrate the Machine Learning (ML) processing into the Archives Hub data processing workflows, so that it can benefit for over 350 institutions, including public art institutions.

***Mick Grierson, Exploring the Daphne Oram Collection using 3D visualisation and machine learning (screenshot). 2012. Mick Grierson, Parag MitalLondon © the artist.***

Providing expertise from over 20 years of running an archival aggregator and working with a whole range of UK archive repositories, particularly around sustainability and the challenges of working with archival metadata.

Employing Machine Learning and Artificial Intelligence in Cultural Institutions

July 9, 2021 / Adrian Stevenson

As mentioned in my last post, we’re looking at the possibilities Artificial Intelligence and Machine Learning can offer the Archives Hub and the archives community in general. I also now have a wider role in Jisc as a ‘Technical Innovations Manager’, so my brief is to consider the wider technical and strategic possibilities of AI/ML for the Digital Resources directorate and Jisc as a whole. We continue to work behind the scenes, but we also keep a watch on cultural heritage and wider sector activities. As part of this I participated in the Aeolian Project’s ‘Online Workshop 1: Employing Machine Learning and Artificial Intelligence in Cultural Institutions’ yesterday.

‘Visual AI and Printed Chapbook Illustrations at the National Library of Scotland’ – Dr Giles Bergel (University of Oxford / National Library of Scotland)

Giles’ team have been using machine learning (ML) on data from data.nls.uk. He outlined their three part approach. First they find illustrations in manuscripts using Google’s EfficientDet object detection convolutional neural network seeded by manually pre-annotated images. They found the object detector worked extremely well after relatively few learning passes. There were a few false positives such as image ink showing through, marginalia and dog ears that would confuse the model.

Image showing false postive ml recognition — False positive ML recognition – ink showthrough

Next they matched and grouped the illustrations using their “state of art” image search engine. Giles believes this shows that AI simplifies the task of finding things that are related in images. The final step was to apply classification alogorithms with the VGG Image Classification Engine which uses Google as a source of labelled images. The lessons learned were:

AI requires well-curated data
Tools for annotating data are no less important than classifiers
Generic image models generalize well to printed books
‘Classical’ computer vision still works
AI software development benefits from end-to-end use-cases including data preparation, refinement, consulting with domain experts, public engagement etc.

‘Machine Learning and Cultural Heritage: What Is It Good Enough For?’ – John Stack (UK Science Museum)

John described how AI is being used as part of the Science Museum’s linked data work to collect data into a central knowledge graph. He noted that the Science Museum are doing a great deal of digitisation but currently they only have what John describes as ‘thin’ object data.

They are looking at using AI for name disambiguation as a first step before adding links to wikidata and using entity recognition to enhance their own catalogue. It stuck me that they, and we at the Hub, have been ‘doing AI’ for a while now with such technologies as entity recognition and OCR before the term AI was used. They are aiming to link through to wikidata such that they can pull in the data and add it to their knowledge graph. This allows them to enhance their local data and apply ML to perform such things as clustering to draw out new insights.

John identified the main benefits of ML currently as suggesting possibilities and identifying trends and gaps. It’s also useful for visualisation and identifying related content as well as enhancing catalogues with new terminology. However there were ‘but’s. ML content needs framing and context. He noted that false positives are not always apparent and usually require specialist knowledge. It’s important to approach things critically and understand what can’t be done. John mentioned that they don’t have any ML driven features in production as yet.

Diagram showing the components of the Heritage Connector software

This was followed by a Q&A where several issues came up. We need to consider how AI may drive new ways/modalities of browsing that we haven’t imagined yet. A major issue is the work needed to feed AI enhancements into user interfaces. Most work so far has been on backend data. AI tools need to integrate into day-to-day workflows for their benefits to be realised. More sector specific case-studies, training materials, tools and models are needed that are appropriate to cultural heritage. See the Heritage Connector blog for more information.

‘AI and the Photoarchive‘ – John McQuaid (Frick Collection), Dr Vardan Papyan (University of Toronto), and X.Y. Han (Cornell University)

The Frick Collection have been using the PyTorch deep neural network to identify labels for their photo archive collection. They then compared the ML results as a validation exercise with internally crowdsourced data from their staff and curators captured by the Zooinverse software for the same photos.

Frick Collection workflow — Frick Collection ML workflow

They found that 67% of the ML labels matched with the crowdsource validations which they considered a good result. They concluded that at present ML is most useful for ‘curatorial amplification’, but much human effort is still needed. This auto-generation of metadata was their main use case so far.

‘Keep True: Three Strategies to Guide AI Engagement‘ – Thomas Padilla (Center for Research Libraries)

Thomas believes GLAMs have an opportunity to distinguish themselves in the AI space. He covered a number of themes, the first being the ’Non-scalability imperative’. Scale is everywhere with AI. There’s a great deal of marketing language about scale, but we need to look at all the non-scalable processes that scale depends on. There’s a problematic dependency where scalability is made possible by non-scalable processes, resources and people. Heterogeneity and diversity can become a problem to be solved by ML. There’s little consideration that AI should be just and fair.

The second theme was ‘Neoliberal traps’ in AI. Who says ethical AI is ethical AI? GLAMs are trying to do the right thing with AI, but this is in the context of neoliberal moral regulation which is unfair and ineffective. He mentioned some of the good examples from the sector including from CILIP, Museums AI Network and his own ‘Responsible Operations‘ paper.

He credited Melissa Terras for asking the question “How are you going to advocate for this with legislation?”. The US doesn’t have any regulations at the moment to get the private sector to get better. I mentioned the UK AI Council who are looking at this in the UK context, and the recent CogX event where the need for AI regulation was discussed in many of the sessions.

The final theme was ‘Maintenance as Innovation’. Information maintenance is a Practice of Care. There is an asserted dichotomy between maintenance and innovation that’s false. Maintenance is sustained innovation and we must value the importance of maintenance to innovation. He appealed to the origin of the word ‘innovation’ which derives from the latin ‘innovare’ which means “to alter, renew, restore, return to a thing, introduce changes in the way something is done or made”. It’s not about creating from new. At the Hub we wholeheartedly endorse this view. We feel there’s far too much focus on the latest technology meme and we’ve had tensions within our own organisation along these lines. There may appear to be some irony here given the topic of this post, but we have been doing AI for a while as noted above. He referred us to https://themaintainers.org/ for more on this.

Roundtable discussion with the AEOLIAN Project Team

Dr Lise Jaillant, Dr Annalina Caputo, Glen Worthey (University of Illinois), Prof. Claire Warwick (Durham University), Prof. J. Stephen Downie (University of Illinois), Dr Paul Gooding (Glasgow University), and Ryan Dubnicek (University of Illinois).

Stephen Downie talked about the need for standardisation of ML extracted features so we can re-use these across GLAMs in a consistent way. The ‘Datasheets for Datasets’ paper was mentioned that proposes “a short document to accompany public datasets, commercial APIs, and pretrained models”. This reminded me of Yves Bernaert’s talk about the related need for standardisation of carbon consumption measures. Both are critical issues and possible areas for Jisc to be involved in providing leadership. Another point that Stephen made is that researchers are finding they can’t afford the bill for ML processing. Finding hardware and resources is a big problem. As noted by ML guru Andrew Ng, we have a considerable data issue with AI and ML work . It may be that we need to work more on the data rather than wasting time, electricity and money re-creating expensive ML models. A related piece of work, ‘Lessons from Archives‘ was also mentioned in this regard. There is a case for sharing model developments across the sector for efficiency and sustainability here.

Artificial Intelligence – Getting the Next Ten Years Right

June 23, 2021 / Adrian Stevenson / 2 Comments

I attended the ‘CogX Global Leadership Summit and Festival of AI’ last week, my first ‘in-person’ event in quite a while. The CogX Festival “gathers the brightest minds in business, government and technology to celebrate innovation, discuss global topics and share the latest trends shaping the defining decade ahead”. Although the event wasn’t orientated towards archives or cultural heritage specifically, we are doing work behind the scenes on AI and machine learning with the Archives Hub that we’ll say more about in due course. Most of what’s described below is relevant to all sectors as AI is a very generalised technology in its application.

My attention was drawn to the event by my niece Laura Stevenson who works at Faculty and was presenting on ‘How the NHS is using AI to predict demand for services‘. Laura has led on Faculty’s AI driven ‘Early Warning System’ that forecasts covid patient admissions and bed usage for the NHS. The system can use data from one trust to help forecast care for a trust in another area, and can help with best and worst scenario planning with 95% confidence. It also incorporates expert knowledge into the modelling to forecast upticks more accurately than doubling rates can. Laura noted that embedding such a system into operational workflows is a considerable extra challenge to developing the technology.

Screenshot of Explainability Data — Example of AI explainability data from the Early Warning System (image *©Faculty.ai* )

The system includes an explainability feature showing various inputs and the degree to which they affect forecasting. To help users trust the tool, the interface has a model performance tab so users can see information on how accurate the tool has been with previous forecasts. The tool is continuing to help NHS operational managers make planning decisions with confidence and is expected to have lasting impact on NHS decision making.

‘Responsible leadership: The risks and the rewards of advancing the state of the art in AI’ – Lila Ibrahim

Lila works at Deep Mind who are looking to use AI to unlock whole new areas of science. Lila highlighted the role of the AI Council who are providing guidance to UK Government in regard to UK AI research. She talked about Alphafold that has been addressing the 50 year old challenge of protein folding. This is a critical issue as being able to predict protein folding unlocks many possibilities including disease control and using enzymes to break down industrial waste. DeepMind have already created an AI system that can help predict how a protein folding occurs and have a peer reviewed article coming out soon. They are trying to get closer to the great challenge of general intelligence.

‘Sustainable Technologies, Green IT & Cloud‘ – Yves Bernaert, Senior Managing Director, Accenture

Yves focussed on company and corporate responsibility, starting his session with some striking statistics:

100 companies produce 70% of global carbon emissions.
40% of water consumption is by companies.
40% of deforestation is by companies.
There is 80 times more industrial waste than consumer waste.
20% of the acidification of the ocean is produced by 20 companies only.

Yves therefore believes that companies have a great responsibility, and technology can help to reduce climate impact. 2% of global electricity comes from data centres currently and is growing exponentially, soon to be 8%. A single email produces on average 4g of carbon. Yves stressed that all companies have to accept that now is the time to come up with solutions and companies must urgently get on with solving this problem. IT energy consumption needs to be seen as something to be fixed. If we use IT more efficiently, emissions can be reduced by 20-30%. The solution starts with measurement which must be built into the IT design process.

We can also design software to be far more efficient. Yves gave the example of AI model accuracy. More accuracy requires more energy. If 96% accuracy is to be improved by just 2%, the cost will be 7 times more energy usage. To train a single neural network requires the equivalent of the full lifecycle energy consumption of five cars. These are massive considerations. Interpreted program code has much higher energy use than compiled code such as C++.

A positive note is that 80% of the global IT workload is expected to move to the cloud in the next 3 years. This will reduce carbon emissions by 84%. Savings can be made with cloud efficiency measures such as scaling systems down and outwards so as not to unneccessarily provision for occasional workload spikes. Cloud migration can save 60 million tons carbon per year which is the equivalent of 20 million full lifecycle car emissions. We have to make this happen!

On where are the big wins, Yves said this is also in the IT area. Companies need to embed sustainability into their goals and strategy. We should go straight for the biggest spend. Make measurements and make changes that will have the most effect. Allow departments and people to know their carbon footprint.

* Update 28th June 2021 * – It was remiss of me not to mention that I’m working on a number of initiatives relating to green sustainable computing at Jisc. We’re looking at assessing the carbon footprint of the Archives Hub using the Cloud Carbon Footprint tool to help us make optimisations. I’m also leading on efforts within my directorate, Digital Resources, to optimise our overall cloud infrastructure using some of the measures mentioned above in conjuction with the Jisc Cloud Solutions team and our General Infrastructure team. Our Cloud CTO Andy Powell says more on this in his ‘AI, cloud and the environment‘ blog post.

‘Future of Research’ – Prof. Dame Ottoline Leyser, CEO, UK Research and Innovation (UKRI)

Ottoline believes that pushing the boundaries of how we support research needs to happen. Research is now more holistic. We draw in what we need to create value. The lone genius is a big problem for research culture and it has to go. Research is insecure and needs connectivity.

Ottoline believes AI will change everything about how research is done. It’s initially replacing mundane tasks but will some more complex tasks such as spotting correlations. Eventually AI will be used as a tool to help understanding in a fundamental way. In terms of the existential risk of AI, we need to embed research as collective endeavour and share effort to mitigate and distribute this risk. It requires culture change, joining up education and entrepreneurship.

We need to fund research in places that are not the usual places. Ottoline likes a football analogy where people are excited and engaged at all levels of the endeavour, whether in the local park or at the stadium. She suggests research at the moment is more like elitist Polo not football.

Ottoline mentioned that UKRI funding does allow for white spaces research. Anyone can apply. However, we need to create wider white spaces to allow research in areas not covered by the usual research categories. It will involve braided and micro careers, not just research careers. Funding is needed to support radical transitions. Ottonline agrees that the slow pace of publication and peer review is a big problem that undermines research. We need to broaden ways we evaluate research. Peer Review is helpful but mustn’t slow things down.

‘Ethics and Bias in AI‘ – Rob Glaser, CEO & Founder of RealNetworks

Rob suggests we are in an era with AI where there are no clear rules of the road yet. The task for AI is to make it safe to ‘drive’ with regulations. We can’t stop facial recognition any more than we can stop gravity. We need datasets for governance so we can check accuracy against these for validation. Transparency is also required so we can validate algorithms. A big AI concern is the tribalism on social media.

‘AI and Healthcare‘ – Rt. Hon. Matt Hancock

Matt Hancock believes we are at a key moment with healthcare and AI technology where it’s now of vital importance. Data saves lives! The next thing is how to take things forward in NHS. A clinical trials interoperability programme is starting that will agreed standards to get more out of data use, and the Government will be updating it’s Data strategy soon. He suggests we need to remove silos and commercial incentives (sic). On the use of GP data he suggests we all agree on the use of data, but the question is how it’s used. The NHS technical architecture needs to improve for better use and building data into the way the NHS works. GPs don’t own patient data, it is the citizen.

He said a data lake is being built across the NHS. Citizen interaction with health data is now greater than ever before and NHS data presents a great opportunity for research, and an enormous opportunity for the use of data to advance health care. He suggested we need to radically simplify the NHS information governance rules. On areas where not enough progress has been made, he mentioned the lack of separation of data layers is currently a problem. So many applications silo their data. There has also been a culture of Individual data with personal curation. The UK is going for a TRE first approach: ‘Trusted Research Environment service for England‘. Data is the preserve of the patient who will allow accredited researchers to use the data through the TRE. The clear preference of citizens is sharing data if they trust the sharing mechanism. Every person goes through a consent process for all data sharing. Acceptance requires motivating people with the lifesaving element of research. If there’s trust, the public will be on side. Researchers in this domain with have to abide by new rules to allow us to build on this data. He mentioned that Ben Goldacre will look at the line where open commons ends and NHS data ownership begins in the forthcoming Goldacre Review.