Bottom-up data Trusts: disturbing the ‘one size fits all’ approach to data governance


Sylvie Delacroix and Neil D Lawrence at International Data Privacy Law: “From the friends we make to the foods we like, via our shopping and sleeping habits, most aspects of our quotidian lives can now be turned into machine-readable data points. For those able to turn these data points into models predicting what we will do next, this data can be a source of wealth. For those keen to replace biased, fickle human decisions, this data—sometimes misleadingly—offers the promise of automated, increased accuracy. For those intent on modifying our behaviour, this data can help build a puppeteer’s strings. As we move from one way of framing data governance challenges to another, salient answers change accordingly. Just like the wealth redistribution way of framing those challenges tends to be met with a property-based, ‘it’s our data’ answer, when one frames the problem in terms of manipulation potential, dignity-based, human rights answers rightly prevail (via fairness and transparency-based answers to contestability concerns). Positive data-sharing aspirations tend to be raised within altogether different conversations from those aimed at addressing the above concerns. Our data Trusts proposal challenges these boundaries.

This article proceeds from an analysis of the very particular type of vulnerability concomitant with our ‘leaking’ data on a daily basis, to show that data ownership is both unlikely and inadequate as an answer to the problems at stake. We also argue that the current construction of top-down regulatory constraints on contractual freedom is both necessary and insufficient. To address the particular type of vulnerability at stake, bottom-up empowerment structures are needed. The latter aim to ‘give a voice’ to data subjects whose choices when it comes to data governance are often reduced to binary, ill-informed consent. While the rights granted by instruments like the GDPR can be used as tools in a bid to shape possible data-reliant futures—such as better use of natural resources, medical care, etc, their exercise is both demanding and unlikely to be as impactful when leveraged individually. As a bottom-up governance structure that is uniquely capable of taking into account the vulnerabilities outlined in the first section, we highlight the constructive potential inherent in data Trusts. This potential crosses the traditional boundaries between individualist protection concerns on one hand and collective empowerment aspirations on the other.

The second section explains how the Trust structure allows data subjects to choose to pool the rights they have over their personal data within the legal framework of a data Trust. It is important that there be a variety of data Trusts, arising out of a mix of publicly and privately funded initiatives. Each Trust will encapsulate a particular set of aspirations, reflected in the terms of the Trust. Bound by a fiduciary obligation of undivided loyalty, data trustees will exercise the data rights held under the Trust according to its particular terms. In contrast to a recently commissioned report,1 we explain why data can indeed be held in a Trust, and why the extent to which certain kinds of data may be said to give rise to property rights is neither here nor there as far as our proposal is concerned. What matters, instead, is the extent to which regulatory instruments such as the GDPR confer rights, and for what kind of data. The breadth of those rights will determine the possible scope of data Trusts in various jurisdictions.

Our ‘Case Studies’ aim to illustrate the complementarity of our data Trusts proposal with the legal provisions pertaining to different kinds of personal data, from medical, genetic, financial, and loyalty card data to social media feeds. The final section critically considers a variety of implementation challenges, which range from Trust Law’s cross-jurisdictional aspects to uptake and exit procedures, including issues related to data of shared provenance. We conclude by highlighting the way in which an ecosystem of data Trusts addresses ethical, legal, and political needs that are complementary to those within the reach of regulatory interventions such as the GDPR….(More)”.

Restricting data’s use: A spectrum of concerns in need of flexible approaches


Dharma Akmon and Susan Jekielek at IASSIST Quaterly: “As researchers consider making their data available to others, they are concerned with the responsible use of data. As a result, they often seek to place restrictions on secondary use. The Research Connections archive at ICPSR makes available the datasets of dozens of studies related to childcare and early education. Of the 103 studies archived to date, 20 have some restrictions on access. While ICPSR’s data access systems were designed primarily to accommodate public use data (i.e. data without disclosure concerns) and potentially disclosive data, our interactions with depositors reveal a more nuanced notion range of needs for restricting use. Some data present a relatively low risk of threatening participants’ confidentiality, yet the data producers still want to monitor who is accessing the data and how they plan to use them. Other studies contain data with such a high risk of disclosure that their use must be restricted to a virtual data enclave. Still other studies rest on agreements with participants that require continuing oversight of secondary use by data producers, funders, and participants. This paper describes data producers’ range of needs to restrict data access and discusses how systems can better accommodate these needs….(More)”.

Tracking the Labor Market with “Big Data”


Tomaz Cajner, Leland Crane, Ryan Decker, Adrian Hamins-Puertolas, and Christopher Kurz at FEDSNotes: “Payroll employment growth is one of the most reliable business cycle indicators. Each postwar recession in the United States has been characterized by a year-on-year drop in payroll employment as measured by the BLS Current Employment Statistics (CES) survey, and, outside of these recessionary declines, the year-on-year payroll employment growth has always been positive. Thus, it is not surprising that policymakers, financial markets, and the general public pay a great deal of attention to the CES payroll employment gains reported at the beginning of each month.

However, while the CES survey is one of the most carefully conducted measures of labor market activity and uses an extremely large sample, it is still subject to significant sampling error and nonsampling errors. For example, when the BLS first reported that private nonfarm payroll gains were 148,000 in July 2019, the associated 90 percent confidence interval was +/- 100,000 due to sampling error alone….

One such source of alternative labor market data is the payroll-processing company ADP, which covers 20 percent of the private workforce. These are the data that underlie ADP’s monthly National Employment Report (NER), which forecasts BLS payroll employment changes by using a combination of ADP-derived data and other publicly available data. In our research, we explore the information content of the ADP microdata alone by producing an estimate of employment changes independent from the BLS payroll series as well as from other data sources.

A potential concern when using the ADP data is that only the firms which hire ADP to manage their payrolls will appear in the data, and this may introduce sample selection issues….(More)”

The Economics of Social Data: An Introduction


Paper by Dirk Bergemann and Alessandro Bonatti: “Large internet platforms collect data from individual users in almost every interaction on the internet. Whenever an individual browses a news website, searches for a medical term or for a travel recommendation, or simply checks the weather forecast on an app, that individual generates data. A central feature of the data collected from the individuals is its social aspect. Namely, the data captured from an individual user is not only informative about this specific individual, but also about users in some metric similar to the individual. Thus, the individual data is really social data. The social nature of the data generates an informational externality that we investigate in this note….(More)”.

Open Cities | Open Data: Collaborative Cities in the Information Era


Book edited by Scott Hawken, Hoon Han and Chris Pettit: “Today the world’s largest economies and corporations trade in data and its products to generate value in new disruptive markets. Within these markets vast streams of data are often inaccessible or untapped and controlled by powerful monopolies. Counter to this exclusive use of data is a promising world-wide “open-data” movement, promoting freely accessible information to share, reuse and redistribute. The provision and application of open data has enormous potential to transform exclusive, technocratic “smart cities” into inclusive and responsive “open-cities”.


This book argues that those who contribute urban data should benefit from its production. Like the city itself, the information landscape is a public asset produced through collective effort, attention, and resources. People produce data through their engagement with the city, creating digital footprints through social medial, mobility applications, and city sensors. By opening up data there is potential to generate greater value by supporting unforeseen collaborations, spontaneous urban innovations and solutions, and improved decision-making insights. Yet achieving more open cities is made challenging by conflicting desires for urban anonymity, sociability, privacy and transparency. This book engages with these issues through a variety of critical perspectives, and presents strategies, tools and case studies that enable this transformation….(More)”.

Mobility Data Sharing: Challenges and Policy Recommendations


Paper by Mollie D’Agostino, Paige Pellaton, and Austin Brown: “Dynamic and responsive transportation systems are a core pillar of equitable and sustainable communities. Achieving such systems requires comprehensive mobility data, or data that reports the movement of individuals and vehicles. Such data enable planners and policymakers to make informed decisions and enable researchers to model the effects of various transportation solutions. However, collecting mobility data also raises concerns about privacy and proprietary interests.

This issue paper provides an overview of the top needs and challenges surrounding mobility data sharing and presents four relevant policy strategies: (1) Foster voluntary agreement among mobility providers for a set of standardized data specifications; (2) Develop clear data-sharing requirements designed for transportation network companies and other mobility providers; (3) Establish publicly held big-data repositories, managed by third parties, to securely hold mobility data and provide structured access by states, cities, and researchers; (4) Leverage innovative land-use and transportation-planning tools….(More)”.

The Promise of Data-Driven Drug Development


Report by the Center for Data Innovation: “From screening chemical compounds to optimizing clinical trials to improving post-market surveillance of drugs, the increased use of data and better analytical tools such as artificial intelligence (AI) hold the potential to transform drug development, leading to new treatments, improved patient outcomes, and lower costs. However, achieving the full promise of data-driven drug development will require the U.S. federal government to address a number of obstacles. This should be a priority for policymakers for two main reasons. First, enabling data-driven drug development will accelerate access to more effective and affordable treatments. Second, the competitiveness of the U.S. biopharmaceutical industry is at risk so long as these obstacles exist. As other nations, particularly China, pursue data-driven innovation, especially greater use of AI, foreign life sciences firms could become more competitive at drug development….(More)”.

New York Report Studies Risks, Rewards of the Smart City


GovTech: “The New York state comptroller tasked his staff with analyzing the deployment of new technologies at the municipal level while cautioning local leaders and the public about cyberthreats.

New York Comptroller Thomas DiNapoli announced the reportSmart Solutions Across the State: Advanced Technology in Local Governments, during a press conference last week in Schenectady, which was featured in the 25-page document for its deployment of an advanced streetlight network.

“New technologies are reshaping how local government services are delivered,” DiNapoli said during the announcement. “Local officials are stepping up to meet the evolving expectations of residents who want their interactions with government to be easy and convenient.”

The report showcases online bill payment for people to resolve parking tickets, utilities and property taxes; bike-share programs using mobile apps to access bicycles in downtown areas; public Wi-Fi through partnerships with telecommunication companies; and more….The modernization of communities across New York could create possibilities for partnerships between municipalities, counties and the state, she said. The report details how a city might attempt to emulate some of the projects included. Martinez said local government leaders should collaborate and share best practices if they decide to innovate their jurisdictions in similar ways….(More)”.

‘Digital colonialism’: why some countries want to take control of their people’s data from Big Tech


Jacqueline Hicks at the Conversation: “There is a global standoff going on about who stores your data. At the close of June’s G20 summit in Japan, a number of developing countries refused to sign an international declaration on data flows – the so-called Osaka Track. Part of the reason why countries such as India, Indonesia and South Africa boycotted the declaration was because they had no opportunity to put their own interests about data into the document.

With 50 other signatories, the declaration still stands as a statement of future intent to negotiate further, but the boycott represents an ongoing struggle by some countries to assert their claim over the data generated by their own citizens.

Back in the dark ages of 2016, data was touted as the new oil. Although the metaphor was quickly debunked it’s still a helpful way to understand the global digital economy. Now, as international negotiations over data flows intensify, the oil comparison helps explain the economics of what’s called “data localisation” – the bid to keep citizens’ data within their own country.

Just as oil-producing nations pushed for oil refineries to add value to crude oil, so governments today want the world’s Big Tech companies to build data centres on their own soil. The cloud that powers much of the world’s tech industry is grounded in vast data centres located mainly around northern Europe and the US coasts. Yet, at the same time, US Big Tech companies are increasingly turning to markets in the global south for expansion as enormous numbers of young tech savvy populations come online….(More)”.

A fairer way forward for AI in health care


Linda Nordling at Nature: “When data scientists in Chicago, Illinois, set out to test whether a machine-learning algorithm could predict how long people would stay in hospital, they thought that they were doing everyone a favour. Keeping people in hospital is expensive, and if managers knew which patients were most likely to be eligible for discharge, they could move them to the top of doctors’ priority lists to avoid unnecessary delays. It would be a win–win situation: the hospital would save money and people could leave as soon as possible.

Starting their work at the end of 2017, the scientists trained their algorithm on patient data from the University of Chicago academic hospital system. Taking data from the previous three years, they crunched the numbers to see what combination of factors best predicted length of stay. At first they only looked at clinical data. But when they expanded their analysis to other patient information, they discovered that one of the best predictors for length of stay was the person’s postal code. This was puzzling. What did the duration of a person’s stay in hospital have to do with where they lived?

As the researchers dug deeper, they became increasingly concerned. The postal codes that correlated to longer hospital stays were in poor and predominantly African American neighbourhoods. People from these areas stayed in hospitals longer than did those from more affluent, predominantly white areas. The reason for this disparity evaded the team. Perhaps people from the poorer areas were admitted with more severe conditions. Or perhaps they were less likely to be prescribed the drugs they needed.

The finding threw up an ethical conundrum. If optimizing hospital resources was the sole aim of their programme, people’s postal codes would clearly be a powerful predictor for length of hospital stay. But using them would, in practice, divert hospital resources away from poor, black people towards wealthy white people, exacerbating existing biases in the system.

“The initial goal was efficiency, which in isolation is a worthy goal,” says Marshall Chin, who studies health-care ethics at University of Chicago Medicine and was one of the scientists who worked on the project. But fairness is also important, he says, and this was not explicitly considered in the algorithm’s design….(More)”.