Balancing Privacy With Data Sharing for the Public Good

David Deming at the New York Times: “Governments and technology companies are increasingly collecting vast amounts of personal data, prompting new laws, myriad investigations and calls for stricter regulation to protect individual privacy.

Yet despite these issues, economics tells us that society needs more data sharing rather than less, because the benefits of publicly available data often outweigh the costs. Public access to sensitive health records sped up the development of lifesaving medical treatments like the messenger-RNA coronavirus vaccines produced by Moderna and Pfizer. Better economic data could vastly improve policy responses to the next crisis.

Data increasingly powers innovation, and it needs to be used for the public good, while individual privacy is protected. This is new and unfamiliar terrain for policymaking, and it requires a careful approach.

The pandemic has brought the increasing dominance of big, data-gobbling tech companies into sharp focus. From online retail to home entertainment, digitally savvy businesses are collecting data and deploying it to anticipate product demand and set prices, lowering costs and outwitting more traditional competitors.

Data provides a record of what has already happened, but its main value comes from improving predictions. Companies like Amazon choose products and prices based on what you — and others like you — bought in the past. Your data improves their decision-making, boosting corporate profits.

Private companies also depend on public data to power their businesses. Redfin and Zillow disrupted the real estate industry thanks to access to public property databases. Investment banks and consulting firms make economic forecasts and sell insights to clients using unemployment and earnings data collected by the Department of Labor. By 2013, one study estimated, public data contributed at least $3 trillion per year to seven sectors of the economy worldwide.

The buzzy refrain of the digital age is that “data is the new oil,” but this metaphor is inaccurate. Data is indeed the fuel of the information economy, but it is more like solar energy than oil — a renewable resource that can benefit everyone at once, without being diminished….(More)”.

My Data, My Choice? – German Patient Organizations’ Attitudes towards Big Data-Driven Approaches in Personalized Medicine. An Empirical-Ethical Study

Paper by Carolin Martina Rauter, Sabine Wöhlke & Silke Schicktanz: “Personalized medicine (PM) operates with biological data to optimize therapy or prevention and to achieve cost reduction. Associated data may consist of large variations of informational subtypes e.g. genetic characteristics and their epigenetic modifications, biomarkers or even individual lifestyle factors. Present innovations in the field of information technology have already enabled the procession of increasingly large amounts of such data (‘volume’) from various sources (‘variety’) and varying quality in terms of data accuracy (‘veracity’) to facilitate the generation and analyzation of messy data sets within a short and highly efficient time period (‘velocity’) to provide insights into previously unknown connections and correlations between different items (‘value’). As such developments are characteristics of Big Data approaches, Big Data itself has become an important catchphrase that is closely linked to the emerging foundations and approaches of PM. However, as ethical concerns have been pointed out by experts in the debate already, moral concerns by stakeholders such as patient organizations (POs) need to be reflected in this context as well. We used an empirical-ethical approach including a website-analysis and 27 telephone-interviews for gaining in-depth insight into German POs’ perspectives on PM and Big Data. Our results show that not all POs are stakeholders in the same way. Comparing the perspectives and political engagement of the minority of POs that is currently actively involved in research around PM and Big Data-driven research led to four stakeholder sub-classifications: ‘mediators’ support research projects through facilitating researcher’s access to the patient community while simultaneously selecting projects they preferably support while ‘cooperators’ tend to contribute more directly to research projects by providing and implemeting patient perspectives. ‘Financers’ provide financial resources. ‘Independents’ keep control over their collected samples and associated patient-related information with a strong interest in making autonomous decisions about its scientific use. A more detailed terminology for the involvement of POs as stakeholders facilitates the adressing of their aims and goals. Based on our results, the ‘independents’ subgroup is a promising candidate for future collaborations in scientific research. Additionally, we identified gaps in PO’s knowledge about PM and Big Data. Based on these findings, approaches can be developed to increase data and statistical literacy. This way, the full potential of stakeholder involvement of POs can be made accessible in discourses around PM and Big Data….(More)”.

Designing Data Trusts. Why We Need to Test Consumer Data Trusts Now

Policy Brief by Aline Blankertz: “Data about individuals, about their preferences and behaviors, has become an increasingly important resource for companies, public agencies, and research institutions. Consumers carry the burden of having to decide which data about them is shared for which purpose. They want to make sure that data about them is not used to infer intimate details of their private life or to pursue other undesirable purposes. At the same time, they want to benefit from personalized products and innovation driven by the same data. The complexity of how data is collected and used overwhelms consumers, many of whom wearily accept privacy policies and lose trust that those who gain effective control over the data will use it for the consumers’ benefit.

At the same time, a few large companies accumulate and lock in vast amounts of data that enable them to use insights across markets and across consumers. In Europe, the General Data Protection Regulation (GDPR) has given data rights to consumers to assert their interests vis-a-vis those companies, but it gives consumers neither enough information nor enough power to make themselves heard. Other organizations, especially small businesses or start-ups, do not have access to the data (unless individual consumers laboriously exercise their right to portability), which often inhibits competition and innovation. Governments across Europe would like to tackle the challenge of reconciling productive data use with privacy. In recent months, data trusts have emerged as a promising solution to enable data-sharing for the benefit of consumers.

The concept has been endorsed by a broad range of stakeholders, including privacy advocates, companies and expert commissions. In Germany, for example, the data ethics commission and the commission competition law 4.0 have recommended further exploring data trusts, and the government is incorporating the concept into its data strategy.

There is no common understanding yet what consumer data trusts are and what they do. In order for them to address the problems mentioned, it is helpful to use as a working definition: consumer data trusts are intermediaries that aggregate consumers’ interests and represent them vis-à-vis data-using organizations. Data trusts use more technical and legal expertise, as well as greater bargaining power, to negotiate with organizations on the conditions of data use to achieve better outcomes than those that individual consumers can achieve. To achieve their consumer-oriented mission, data trusts should be able to assign access rights, audit data practices, and support enforcement. They may or may not need to hold data…(More)”.

Inside the ‘Wikipedia of Maps,’ Tensions Grow Over Corporate Influence

Corey Dickinson at Bloomberg: “What do Lyft, Facebook, the International Red Cross, the U.N., the government of Nepal and Pokémon Go have in common? They all use the same source of geospatial data: OpenStreetMap, a free, open-source online mapping service akin to Google Maps or Apple Maps. But unlike those corporate-owned mapping platforms, OSM is built on a network of mostly volunteer contributors. Researchers have described it as the “Wikipedia for maps.”

Since it launched in 2004, OpenStreetMap has become an essential part of the world’s technology infrastructure. Hundreds of millions of monthly users interact with services derived from its data, from ridehailing apps, to social media geotagging on Snapchat and Instagram, to humanitarian relief operations in the wake of natural disasters. 

But recently the map has been changing, due the growing impact of private sector companies that rely on it. In a 2019 paper published in the ISPRS International Journal of Geo-Information, a cross-institutional team of researchers traced how Facebook, Apple, Microsoft and other companies have gained prominence as editors of the map. Their priorities, the researchers say, are driving significant change to what is being mapped compared to the past. 

“OpenStreetMap’s data is crowdsourced, which has always made spectators to the project a bit wary about the quality of the data,” says Dipto Sarkar, a professor of geoscience at Carleton University in Ottawa, and one of the paper’s co-authors. “As the data becomes more valuable and is used for an ever-increasing list of projects, the integrity of the information has to be almost perfect. These companies need to make sure there’s a good map of the places they want to expand in, and nobody else is offering that, so they’ve decided to fill it in themselves.”…(More)”.

Collective bargaining on digital platforms and data stewardship

Paper by Astha Kapoor: “… there is a need to think of exploitation on platforms not only through the lens of labour rights but also that of data rights. In the current context, it is impossible to imagine well-being without more agency on the way data are collected, stored and used. It is imperative to envision structures through which worker communities and representatives can be more involved in determining their own data lives on platforms. There is a need to organize and mobilize workers on data rights.

One of the ways in which this can be done is through a mechanism of community data stewards who represent the needs and interests of workers to their platforms, thus negotiating and navigating the data-based decisions. This paper examines the need for data rights as a critical requirement for worker well-being in the platform economy and the ways in which it can be actualized. It argues, given that workers on platforms produce data through collective labour on and off the platform, that worker data are a community resource and should be governed by representatives of workers who can negotiate with platforms on the use of that data for workers and for the public interest. The paper analyses the opportunity for a community data steward mechanism that represents workers’ interests and intermediates on data issues, such as transparency and accountability, with offline support systems. And is also a voice to online action to address some of the injustices of the data economy. Thus, a data steward is a tool through which workers better control their data—consent, privacy and rights—better and organize online. Essentially, it is a way forward for workers to mobilize collective bargaining on data rights.

The paper covers the impact of the COVID-19 pandemic on workers’ rights and well-being. It explores the idea of community data rights on the platform economy and why collective bargaining on data is imperative for any kind of meaningful negotiation with technology companies. The role of a community data steward in reclaiming workers’ power in the platform economy is explained, concluding with policy recommendations for a community data steward structure in the Indian context….(More)”.

Public-Private Partnerships: Compound and Data Sharing in Drug Discovery and Development

Paper by Andrew M. Davis et al: “Collaborative efforts between public and private entities such as academic institutions, governments, and pharmaceutical companies form an integral part of scientific research, and notable instances of such initiatives have been created within the life science community. Several examples of alliances exist with the broad goal of collaborating toward scientific advancement and improved public welfare. Such collaborations can be essential in catalyzing breaking areas of science within high-risk or global public health strategies that may have otherwise not progressed. A common term used to describe these alliances is public-private partnership (PPP). This review discusses different aspects of such partnerships in drug discovery/development and provides example applications as well as successful case studies. Specific areas that are covered include PPPs for sharing compounds at various phases of the drug discovery process—from compound collections for hit identification to sharing clinical candidates. Instances of PPPs to support better data integration and build better machine learning models are also discussed. The review also provides examples of PPPs that address the gap in knowledge or resources among involved parties and advance drug discovery, especially in disease areas with unfulfilled and/or social needs, like neurological disorders, cancer, and neglected and rare diseases….(More)”.

Governance for Innovation and Privacy: The Promise of Data Trusts and Regulatory Sandboxes

Essay by Chantal Bernier: “Innovation feeds on data, both personal, identified data and de-identified data. To protect the data from increasing privacy risks, governance structures emerge to allow the use and sharing of data as necessary for innovation while addressing privacy risks. Two frameworks proposed to fulfill this purpose are data trusts and regulatory sandboxes.

The Government of Canada introduced the concept of “data trust” into the Canadian privacy law modernization discussion through Canada’s Digital Charter in Action: A Plan by Canadians, for Canadians, to “enable responsible innovation.” At a high level, a data trust may be defined, according to the Open Data Institute, as a legal structure that is appropriate to the data sharing it is meant to govern and that provides independent stewardship of data.

Bill C-11, known as the Digital Charter Implementation Act, 2020, and tabled on November 17, 2020, lays the groundwork for the possibility of creating data trusts for private organizations to disclose de-identified data to specific public institutions for “socially beneficial purposes.” In her recent article “Replacing Canada’s 20-Year-Old Data Protection Law,” Teresa Scassa provides a superb overview and analysis of the bill.

Another instrument for privacy protective innovation is referred to as the “regulatory sandbox.” The United Kingdom’s Information Commissioner’s Office (ICO) provides a regulatory sandbox service that encourages organizations to submit innovative initiatives without fear of enforcement action. From there, the ICO sandbox team provides advice related to privacy risks and how to embed privacy protection.

Both governance measures may hold the future of privacy and innovation, provided that we accept this equation: De-identified data may no longer be considered irrevocably anonymous and therefore should not be released unconditionally, but the risk of re-identification is so remote that the data may be released under a governance structure that mitigates the residual privacy risk.  

Innovation Needs Identified Personal Data and De-identified Data   

The role of data in innovation does not need to be explained. Innovation requires a full understanding of what is, to project toward what could be. The need for personal data, however, calls for far more than an explanation. Its use must be justified. Applications abound, and they may not be obvious to the layperson. Researchers and statisticians, however, underline the critical role of personal data with one word: reliability.

Processing data that can be traced, either through identifiers or through pseudonyms, allows superior machine learning, longitudinal studies and essential correlations, which provide, in turn, better data in which to ground innovation. Statistics Canada has developed a “Continuum of Microdata Access” to its databases on the premise that “researchers require access to microdata at the individual business, household or person level for research purposes. To preserve the privacy and confidentiality of respondents, and to encourage the use of microdata, Statistics Canada offers a wide range of options through a series of online channels, facilities and programs.”

Since the first national census in 1871, Canada has put data — derived from personal data collected through the census and surveys — to good use in the public and private sectors alike. Now, new privacy risks emerge, as the unprecedented volume of data collection and the power of analytics bring into question the notion that the de-identification of data — and therefore its anonymization — is irreversible.

And yet, data to inform innovation for the good of humanity cannot exclude data about humans. So, we must look to governance measures to release de-identified data for innovation in a privacy-protective manner. …(More)”.

Fostering trustworthy data sharing: Establishing data foundations in practice

Paper by Sophie Stalla-Bourdillon, Laura Carmichael and Alexsis Wintour: “Independent data stewardship remains a core component of good data governance practice. Yet, there is a need for more robust independent data stewardship models that are able to oversee data-driven, multi-party data sharing, usage and re-usage, which can better incorporate citizen representation, especially in relation to personal data. We propose that data foundations—inspired by Channel Islands’ foundations laws—provide a workable model for good data governance not only in the Channel Islands, but also elsewhere. A key advantage of this model—in addition to leveraging existing legislation and building on established precedent—is the statutory role of the guardian that is a unique requirement in the Channel Islands, and when interpreted in a data governance model provides the independent data steward. The principal purpose for this paper, therefore, is to demonstrate why data foundations are well suited to the needs of data sharing initiatives. We further examine how data foundations could be established in practice—and provide key design principles that should be used to guide the design and development of any data foundation….(More)”.

Tracking COVID-19 using online search

Paper by Vasileios Lampos et al: “Previous research has demonstrated that various properties of infectious diseases can be inferred from online search behaviour. In this work we use time series of online search query frequencies to gain insights about the prevalence of COVID-19 in multiple countries. We first develop unsupervised modelling techniques based on associated symptom categories identified by the United Kingdom’s National Health Service and Public Health England. We then attempt to minimise an expected bias in these signals caused by public interest—as opposed to infections—using the proportion of news media coverage devoted to COVID-19 as a proxy indicator. Our analysis indicates that models based on online searches precede the reported confirmed cases and deaths by 16.7 (10.2–23.2) and 22.1 (17.4–26.9) days, respectively. We also investigate transfer learning techniques for mapping supervised models from countries where the spread of the disease has progressed extensively to countries that are in earlier phases of their respective epidemic curves. Furthermore, we compare time series of online search activity against confirmed COVID-19 cases or deaths jointly across multiple countries, uncovering interesting querying patterns, including the finding that rarer symptoms are better predictors than common ones. Finally, we show that web searches improve the short-term forecasting accuracy of autoregressive models for COVID-19 deaths. Our work provides evidence that online search data can be used to develop complementary public health surveillance methods to help inform the COVID-19 response in conjunction with more established approaches….(More)”.

Robot census: Gathering data to improve policymaking on new technologies

Essay by Robert Seamans: There is understandable excitement about the impact that new technologies like artificial intelligence (AI) and robotics will have on our economy. In our everyday lives, we already see the benefits of these technologies: when we use our smartphones to navigate from one location to another using the fastest available route or when a predictive typing algorithm helps us finish a sentence in our email. At the same time, there are concerns about possible negative effects of these new technologies on labor. The Council of Economic Advisers of the past two Administrations have addressed these issues in the annual Economic Report of the President (ERP). For example, the 2016 ERP included a chapter on technology and innovation that linked robotics to productivity and growth, and the 2019 ERP included a chapter on artificial intelligence that discussed the uneven effects of technological change. Both these chapters used data at highly aggregated levels, in part because that is the data that is available. As I’ve noted elsewhere, AI and robots are everywhere, except, as it turns out, in the data.

To date, there have been no large scale, systematic studies in the U.S. on how robots and AI affect productivity and labor in individual firms or establishments (a firm could own one or more establishments, which for example could be a plant in a manufacturing setting or a storefront in a retail setting). This is because the data are scarce. Academic researchers interested in the effects of AI and robotics on economic outcomes have mostly used aggregate country and industry-level data. Very recently, some have studied these issues at the firm level using data on robot imports to France, Spain, and other countries. I review a few of these academic papers in both categories below, which provide early findings on the nuanced role these new technologies have on labor. Thanks to some excellent work being done by the U.S. Census Bureau, however, we may soon have more data to work with. This includes new questions on robot purchases in the Annual Survey of Manufacturers and Annual Capital Expenditures Survey and new questions on other technologies including cloud computing and machine learning in the Annual Business Survey….(More)”.