Data Trusts: Ethics, Architecture and Governance for Trustworthy Data Stewardship

Web Science Institute Paper by Kieron O’Hara: “In their report on the development of the UK AI industry, Wendy Hall and Jérôme Pesenti
recommend the establishment of data trusts, “proven and trusted frameworks and agreements” that will “ensure exchanges [of data] are secure and mutually beneficial” by promoting trust in the use of data for AI. Hall and Pesenti leave the structure of data trusts open, and the purpose of this paper is to explore the questions of (a) what existing structures can data trusts exploit, and (b) what relationship do data trusts have to
trusts as they are understood in law?

The paper defends the following thesis: A data trust works within the law to provide ethical, architectural and governance support for trustworthy data processing

Data trusts are therefore both constraining and liberating. They constrain: they respect current law, so they cannot render currently illegal actions legal. They are intended to increase trust, and so they will typically act as
further constraints on data processors, adding the constraints of trustworthiness to those of law. Yet they also liberate: if data processors
are perceived as trustworthy, they will get improved access to data.

Most work on data trusts has up to now focused on gaining and supporting the trust of data subjects in data processing. However, all actors involved in AI – data consumers, data providers and data subjects – have trust issues which data trusts need to address.

Furthermore, it is not only personal data that creates trust issues; the same may be true of any dataset whose release might involve an organisation risking competitive advantage. The paper addresses four areas….(More)”.

Big data needs big governance: best practices from Brain-CODE, the Ontario Brain Institute’s neuroinformatics platform

Shannon C. Lefaivre et al in Frontiers of Genetics: “The Ontario Brain Institute (OBI) has begun to catalyze scientific discovery in the field of neuroscience through its large-scale informatics platform, known as Brain-CODE. The platform supports the capture, storage, federation, sharing and analysis of different data types across several brain disorders. Underlying the platform is a robust and scalable data governance structure which allows for the flexibility to advance scientific understanding, while protecting the privacy of research participants.

Recognizing the value of an open science approach to enabling discovery, the governance structure was designed not only to support collaborative research programs, but also to support open science by making all data open and accessible in the future. OBI’s rigorous approach to data sharing maintains the accessibility of research data for big discoveries without compromising privacy and security. Taking a Privacy by Design approach to both data sharing and development of the platform has allowed OBI to establish some best practices related to large scale data sharing within Canada. The aim of this report is to highlight these best practices and develop a key open resource which may be referenced during the development of similar open science initiatives….(More)”.

Balancing information governance obligations when accessing social care data for collaborative research

Paper by Malkiat Thiarai, Sarunkorn Chotvijit and Stephen Jarvis: “There is significant national interest in tackling issues surrounding the needs of vulnerable children and adults. This paper aims to argue that much value can be gained from the application of new data-analytic approaches to assist with the care provided to vulnerable children. This paper highlights the ethical and information governance issues raised in the development of a research project that sought to access and analyse children’s social care data.

The paper documents the process involved in identifying, accessing and using data held in Birmingham City Council’s social care system for collaborative research with a partner organisation. This includes identifying the data, its structure and format; understanding the Data Protection Act 1998 and 2018 (DPA) exemptions that are relevant to ensure that legal obligations are met; data security and access management; the ethical and governance approval process.

The findings will include approaches to understanding the data, its structure and accessibility tasks involved in addressing ethical and legal obligations and requirements of the ethical and governance processes….(More)”.

The new ecosystem of trust: How data trusts, collaboratives and coops can help govern data for the maximum public benefit

Paper by Geoff Mulgan and Vincent Straub: The world is struggling to govern data. The challenge is to reduce abuses of all kinds, enhance accountability and improve ethical standards, while also ensuring that the maximum public and private value can also be derived from data.

Despite many predictions to the contrary the world of commercial data is dominated by powerful organisations. By contrast, there are few institutions to protect the public interest and those that do exist remain relatively weak. This paper argues that new institutions—an ecosystem of trust—are needed to ensure that uses of data are trusted and trustworthy. It advocates the creation of different kinds of data trust to fill this gap. It argues:

  • That we need, but currently lack, institutions that are good at thinking through, discussing, and explaining the often complex trade-offs that need to be made about data.
  • That the task of creating trust is different in different fields. Overly generic solutions will be likely to fail.
  • That trusts need to be accountable—in some cases to individual members where there is a direct relationship with individuals giving consent, in other cases to the broader public.
  • That we should expect a variety of types of data trust to form—some sharing data; some managing synthetic data; some providing a research capability; some using commercial data and so on. The best analogy is finance which over time has developed a very wide range of types of institution and governance.

This paper builds on a series of Nesta think pieces on data and knowledge commons published over the last decade and current practical projects that explore how data can be mobilised to improve healthcarepolicing, the jobs market and education. It aims to provide a framework for designing a new family of institutions under the umbrella title of data trusts, tailored to different conditions of consent, and different patterns of private and public value. It draws on the work of many others (including the work of GovLab and the Open Data Institute).


The governance of personal data of all kinds has recently moved from being a very marginal specialist issue to one of general concern. Too much data has been misused, lost, shared, sold or combined with little involvement of the people most affected, and little ethical awareness on the part of the organisations in charge.

The most visible responses have been general ones—like the EU’s GDPR. But these now need to be complemented by new institutions that can be generically described as ‘data trusts’.

In current practice the term ‘trust’ is used to describe a very wide range of institutions. These include private trusts, a type of legal structure that holds and makes decisions about assets, such as property or investments, and involves trustors, trustees, and beneficiaries. There are also public trusts in fields like education with a duty to provide a public benefit. Examples include the Nesta Trust and the National Trust. There are trusts in business (e.g. to manage pension funds). And there are trusts in the public sector, such as the BBC Trust and NHS Foundation Trusts with remits to protect the public interest, at arms length from political decisions.

It’s now over a decade since the first data trusts were set up as private initiatives in response to anxieties about abuse. These were important pioneers though none achieved much scale or traction.

Now a great deal of work is underway around the world to consider what other types of trust might be relevant to data, so as to fill the governance vacuum—handling everything from transport data to personalised health, the internet of things to school records, and recognising the very different uses of data—by the state for taxation or criminal justice etc.; by academia for research; by business for use and resale; and to guide individual choices. This paper aims to feed into that debate.

1. The twin problems: trust and value

Two main clusters of problem are coming to prominence. The first cluster of problems involve misuseand overuse of data; the second set of problems involves underuse of data.

1.1. Lack of control fuels distrust

The first problem is a lack of control and agency—individuals feel unable to control data about their own lives (from Facebook links and Google searches to retail behaviour and health) and communities are unable to control their own public data (as in Sidewalk labs and other smart city projects that attempted to privatise public data). Lack of control leads to the risk of abuses of privacy, and a wider problem of decreasing trust—which survey evidence from the Open Data Institute (ODI) shows is key in determining the likelihood consumers will share their personal data (although this varies across countries). The lack of transparency regarding how personal data is then used to train algorithms making decisions only adds to the mistrust.

1.2 Lack of trust leads to a deficit of public value

The second, mirror cluster of problems concern value. Flows of data promise a lot: better ways to assess problems, understand options, and make decisions. But current arrangements make it hard for individuals to realise the greatest value from their own data, and they make it even harder for communities to safely and effectively aggregate, analyse and link data to solve pressing problems, from health and crime to mobility. This is despite the fact that many consumers are prepared to make trade-offs: to share data if it benefits themselves and others—a 2018 Nesta poll found, for example, that 73 per cent of people said they would share their personal data in an effort to improve public services if there was a simple and secure way of doing it. A key reason for the failure to maximise public value is the lack of institutions that are sufficiently trusted to make judgements in the public interest.

Attempts to answer these problems sometimes point in opposite directions—the one towards less free flow, less linking of data, the other towards more linking and combination. But any credible policy responses have to address both simultaneously.

2. The current landscape

The governance field was largely empty earlier this decade. It is now full of activity, albeit at an early stage. Some is legislative—like GDPR and equivalents being considered around the world. Some is about standards—like Verify, IHAN and other standards intended to handle secure identity. Some is more entrepreneurial—like the many Personal Data Stores launched over the last decade, from Mydexto SOLID, Citizen-me to Some are experiments like the newly launched Amsterdam Data Exchange (Amdex) and the UK government’s recently announced efforts to fund data trust pilots to tackle wildlife conservation, working with the ODI. Finally, we are now beginning to see new institutions within government to guide and shape activity, notably the new Centre for Data Ethics and Innovation.

Many organisations have done pioneering work, including the ODI in the UK and NYU GovLab with its work on data collaboratives. At Nesta, as part of the Europe-wide DECODE consortium, we are helping to develop new tools to give people control of their personal data while the Next Generation Internet (NGI) initiative is focused on creating a more inclusive, human-centric and resilient internet—with transparency and privacy as two of the guiding pillars.

The task of governing data better brings together many elements, from law and regulation to ethics and standards. We are just beginning to see more serious discussion about tax and data—from the proposals to tax digital platforms turnover to more targeted taxes of data harvesting in public places or infrastructures—and more serious debate around regulation. This paper deals with just one part of this broader picture: the role of institutions dedicated to curating data in the public interest….(More)”.

Tomorrow’s Data Heroes

Article by Florian GrönePierre Péladeau, and Rawia Abdel Samad: “Telecom companies are struggling to find a profitable identity in today’s digital sphere. What about helping customers control their information?…

By 2025, Alex had had enough. There no longer seemed to be any distinction between her analog and digital lives. Everywhere she went, every purchase she completed, and just about every move she made, from exercising at the gym to idly surfing the Web, triggered a vast flow of data. That in turn meant she was bombarded with personalized advertising messages, targeted more and more eerily to her. As she walked down the street, messages appeared on her phone about the stores she was passing. Ads popped up on her all-purpose tablet–computer–phone pushing drugs for minor health problems she didn’t know she had — until the symptoms appeared the next day. Worse, she had recently learned that she was being reassigned at work. An AI machine had mastered her current job by analyzing her use of the firm’s productivity software.

It was as if the algorithms of global companies knew more about her than she knew herself — and they probably did. How was it that her every action and conversation, even her thoughts, added to the store of data held about her? After all, it was her data: her preferences, dislikes, interests, friendships, consumer choices, activities, and whereabouts — her very identity — that was being collected, analyzed, profited from, and even used to manage her. All these companies seemed to be making money buying and selling this information. Why shouldn’t she gain some control over the data she generated, and maybe earn some cash by selling it to the companies that had long collected it free of charge?

So Alex signed up for the “personal data manager,” a new service that promised to give her control over her privacy and identity. It was offered by her U.S.-based connectivity company (in this article, we’ll call it DigiLife, but it could be one of many former telephone companies providing Internet services in 2025). During the previous few years, DigiLife had transformed itself into a connectivity hub: a platform that made it easier for customers to join, manage, and track interactions with media and software entities across the online world. Thanks to recently passed laws regarding digital identity and data management, including the “right to be forgotten,” the DigiLife data manager was more than window dressing. It laid out easy-to-follow choices that all Web-based service providers were required by law to honor….

Today, in 2019, personal data management applications like the one Alex used exist only in nascent form, and consumers have yet to demonstrate that they trust these services. Nor can they yet profit by selling their data. But the need is great, and so is the opportunity for companies that fulfill it. By 2025, the total value of the data economy as currently structured will rise to more than US$400 billion, and by monetizing the vast amounts of data they produce, consumers can potentially recapture as much as a quarter of that total.

Given the critical role of telecom operating companies within the digital economy — the central position of their data networks, their networking capabilities, their customer relationships, and their experience in government affairs — they are in a good position to seize this business opportunity. They might not do it alone; they are likely to form consortia with software companies or other digital partners. Nonetheless, for legacy connectivity companies, providing this type of service may be the most sustainable business option. It may also be the best option for the rest of us, as we try to maintain control in a digital world flooded with our personal data….(More)”.

Open-Data: A Solution When Data Constitutes an Essential Facility?

Chapter by Claire Borsenberger, Mathilde Hoang and Denis Joram: “Thanks to appropriate data algorithms, firms, especially those on-line, are able to extract detailed knowledge about consumers and markets. This raises the question of the essential facility character of data. Moreover, the features of digital markets lead to a concentration of this core input in the hands of few big “superstars” and arouse legitimate economic and societal concerns. In a more and more data-driven society, one could ask if data openness is a solution to deal with power derived from data concentration. We conclude that only a case-by-case approach should be followed. Mandatory open data policy should be conditioned on an ex-ante cost-benefit analysis proving that the benefits of disclosure exceed its costs….(More)”.

Assessing the Legitimacy of “Open” and “Closed” Data Partnerships for Sustainable Development

Paper by Andreas Rasche, Mette Morsing and Erik Wetter in Business and Society: “This article examines the legitimacy attached to different types of multi-stakeholder data partnerships occurring in the context of sustainable development. We develop a framework to assess the democratic legitimacy of two types of data partnerships: open data partnerships (where data and insights are mainly freely available) and closed data partnerships (where data and insights are mainly shared within a network of organizations). Our framework specifies criteria for assessing the legitimacy of relevant partnerships with regard to their input legitimacy as well as their output legitimacy. We demonstrate which particular characteristics of open and closed partnerships can be expected to influence an analysis of their input and output legitimacy….(More)”.

Fact-Based Policy: How Do State and Local Governments Accomplish It?

Report and Proposal by Justine Hastings: “Fact-based policy is essential to making government more effective and more efficient, and many states could benefit from more extensive use of data and evidence when making policy. Private companies have taken advantage of declining computing costs and vast data resources to solve problems in a fact-based way, but state and local governments have not made as much progress….

Drawing on her experience in Rhode Island, Hastings proposes that states build secure, comprehensive, integrated databases, and that they transform those databases into data lakes that are optimized for developing insights. Policymakers can then use the insights from this work to sharpen policy goals, create policy solutions, and measure progress against those goals. Policymakers, computer scientists, engineers, and economists will work together to build the data lake and analyze the data to generate policy insights….(More)”.

Using Personal Informatics Data in Collaboration among People with Different Expertise

Dissertation by Chia-Fang Chung: “Many people collect and analyze data about themselves to improve their health and wellbeing. With the prevalence of smartphones and wearable sensors, people are able to collect detailed and complex data about their everyday behaviors, such as diet, exercise, and sleep. This everyday behavioral data can support individual health goals, help manage health conditions, and complement traditional medical examinations conducted in clinical visits. However, people often need support to interpret this self-tracked data. For example, many people share their data with health experts, hoping to use this data to support more personalized diagnosis and recommendations as well as to receive emotional support. However, when attempting to use this data in collaborations, people and their health experts often struggle to make sense of the data. My dissertation examines how to support collaborations between individuals and health experts using personal informatics data.

My research builds an empirical understanding of individual and collaboration goals around using personal informatics data, current practices of using this data to support collaboration, and challenges and expectations for integrating the use of this data into clinical workflows. These understandings help designers and researchers advance the design of personal informatics systems as well as the theoretical understandings of patient-provider collaboration.

Based on my formative work, I propose design and theoretical considerations regarding interactions between individuals and health experts mediated by personal informatics data. System designers and personal informatics researchers need to consider collaborations occurred throughout the personal tracking process. Patient-provider collaboration might influence individual decisions to track and to review, and systems supporting this collaboration need to consider individual and collaborative goals as well as support communication around these goals. Designers and researchers should also attend to individual privacy needs when personal informatics data is shared and used across different healthcare contexts. With these design guidelines in mind, I design and develop Foodprint, a photo-based food diary and visualization system. I also conduct field evaluations to understand the use of lightweight data collection and integration to support collaboration around personal informatics data. Findings from these field deployments indicate that photo-based visualizations allow both participants and health experts to easily understand eating patterns relevant to individual health goals. Participants and health experts can then focus on individual health goals and questions, exchange knowledge to support individualized diagnoses and recommendations, and develop actionable and feasible plans to accommodate individual routines….(More)”.

What Makes a City Street Smart?

Taxi and Limousine Commission’s (TLC): “Cities aren’t born smart. They become smart by understanding what is happening on their streets. Measurement is key to management, and amid the incomparable expansion of for-hire transportation service in New York City, measuring street activity is more important than ever. Between 2015 (when app companies first began reporting data) and June 2018, trips by app services increased more than 300%, now totaling over 20 million trips each month. That’s more cars, more drivers, and more mobility.

Taxi and Limousine Commission’s (TLC): “Cities aren’t born smart. They become smart by understanding what is happening on their streets. Measurement is key to management, and amid the incomparable expansion of for-hire transportation service in New York City, measuring street activity is more important than ever. Between 2015 (when app companies first began reporting data) and June 2018, trips by app services increased more than 300%, now totaling over 20 million trips each month. That’s more cars, more drivers, and more mobility.

We know the true scope of this transformation today only because of the New York City Taxi and Limousine Commission’s (TLC) pioneering regulatory actions. Unlike most cities in the country, app services cannot operate in NYC unless they give the City detailed information about every trip. This is mandated by TLC rules and is not contingent on companies voluntarily “sharing” only a self-selected portion of the large amount of data they collect. Major trends in the taxi and for-hire vehicle industry are highlighted in TLC’s 2018 Factbook.

What Transportation Data Does TLC Collect?

Notably, Uber, Lyft, and their competitors today must give the TLC granular data about each and every trip and request for service. TLC does not receive passenger information; we require only the data necessary to understand traffic patterns, working conditions, vehicle efficiency, service availability, and other important information.

One of the most important aspects of the data TLC collects is that they are stripped of identifying information and made available to the public. Through the City’s Open Data portal, TLC’s trip data help businesses distinguish new business opportunities from saturated markets, encourage competition, and help investors follow trends in both new app transportation and the traditional car service and hail taxi markets. As app companies contemplate going public, their investors have surely already bookmarked TLC’s Open Data site.

Using Data to Improve Mobility

With this information NYC now knows people are getting around the boroughs using app services and shared rides with greater frequency. These are the same NYC neighborhoods that traditionally were not served by yellow cabs and often have less robust public transportation options. We also know these services provide an increasing number of trips in congested areas like Manhattan and the inner rings of Brooklyn and Queens, where public transportation options are relatively plentiful….(More)”.