Contracts for Data Collaboration


The GovLab: “The road to achieving the Sustainable Development Goals is complex and challenging. Policymakers around the world need both new solutions and new ways to become more innovative. This includes evidence-based policy and program design, as well as improved monitoring of progress made.

Unlocking privately processed data through data collaboratives — a new form of public-private partnership in which private industry, government and civil society work together to release previously siloed data — has become essential to address the challenges of our era.

Yet while research has proven its promise and value, several barriers to scaling data collaboration exist.

Ensuring trust and shared responsibility in how the data will be handled and used proves particularly challenging, because of the high transaction costs involved in drafting contracts and agreements of sharing.

Ensuring Trust in Data Collaboration

The goal of the Contracts for Data Collaboration (C4DC) initiative is to address the inefficiencies of developing contractual agreements for public-private data collaboration.

The intent is to inform and guide those seeking to establish a data collaborative by developing and making available a shared repository of contractual clauses (taken from existing data sharing agreements) that covers a host of issues, including (non –exclusive):

  • The provenance, quality and purpose of data;
  • Security and privacy concerns;
  • Roles and responsibilities of participants;
  • Access provisions; and use limitations;
  • Governance mechanisms;
  • Other contextual mechanisms

In addition to the searchable library of contractual clauses, the repository will house use cases, guides and other information that analyse common patterns, language and best practices.

Help Us Scale Data Collaboration

Contracts for Data Collaboration builds on efforts from member organizations that have experience in developing and managing data collaboratives; and have documented the legal challenges and opportunities of data collaboration.

The initiative is an open collaborative with charter members from the GovLab at NYU, UN SDSN Thematic Research Network on Data and Statistics (TReNDS), University of Washington and the World Economic Forum.

Organizations interested in joining the initiative should contact the individuals noted below; or share any agreements they have used for data sharing activities (without any sensitive or identifiable information): Stefaan Verhulst, GovLab ([email protected]) …(More)

Looking after and using data for public benefit


Heather Savory at the Office for National Statistics (UK): “Official Statistics are for the benefit of society and the economy and help Britain to make better decisions. They allow the formulation of better public policy and the effective measurement of those policies. They inform the direction of economic and commercial activities. They provide valuable information for analysts, researchers, public and voluntary bodies. They enable the public to hold organisations that spend public money to account, thus informing democratic debate.

The ability to harness the power of data is critical in enabling official statistics to support the most important decisions facing the country.

Under the new powers in the Digital Economy Act , ONS can now gain access to new and different sources of data including ‘administrative’ data from government departments and commercial data. Alongside the availability of these new data sources ONS is experiencing a strong demand for ad hoc insights alongside our traditional statistics.

We need to deliver more, faster, finer-grained insights into the economy and society. We need to deliver high quality, trustworthy information, on a faster timescale, to help decision-making. We will increasingly develop innovative data analysis methods, for example using images to gain insight from the work we’ve recently announced on Urban Forests….

I should explain here that our data is not held in one big linked database; we’re architecting our Data Access Platform so that data can be linked in different ways for different purposes. This is designed to preserve data confidentiality, so only the necessary subset of data is accessible by authorised people, for a certain purpose. To avoid compromising their effectiveness, we do not make public the specific details of the security measures we have in place, but our recently tightened security regime, which is independently assured by trusted external bodies, includes:

  • physical measures to restrict who can access places where data is stored;
  • protective measures for all data-related IT services;
  • measures to restrict who can access systems and data held by ONS;
  • controls to guard against staff or contractors misusing their legitimate access to data; including vetting to an appropriate level for the sensitivity of data to which they might have access.

One of the things I love about working in the public sector is that our work can be shared openly.

We live in a rapidly changing and developing digital world and we will continue to monitor and assess the data standards and security measures in place to ensure they remain strong and effective. So, as well as sharing this work openly to reassure all our data suppliers that we’re taking good care of their data, we’re also seeking feedback on our revised data policies.

The same data can provide different insights when viewed through different lenses or in different combinations. The more data is shared – with the appropriate safeguards of course – the more it has to give.

If you work with data, you’ll know that collaborating with others in this space is key and that we need to be able to share data more easily when it makes sense to do so. So, the second reason for sharing this work openly is that, if you’re in the technical space, we’d value your feedback on our approach and if you’re in the data space and would like to adopt the same approach, we’d love to support you with that – so that we can all share data more easily in the future….(More)

ONS’s revised policies on the use, management and security of data can befound here.

All of Us Research Program Expands Data Collection Efforts with Fitbit


NIH Press Release: “The All of Us Research Program has launched the Fitbit Bring-Your-Own-Device (BYOD) project. Now, in addition to providing health information through surveys, electronic health records, and biosamples, participants can choose to share data from their Fitbit accounts to help researchers make discoveries. The project is a key step for the program in integrating digital health technologies for data collection.

Digital health technologies, like mobile apps and wearable devices, can gather data outside of a hospital or clinic. This data includes information about physical activity, sleep, weight, heart rate, nutrition, and water intake, which can give researchers a more complete picture of participants’ health. The All of Us Research Program is now gathering this data in addition to surveys, electronic health record information, physical measurements, and blood and urine samples, working to make the All of Us resource one of the largest and most diverse data sets of its kind for health research.

“Collecting real-world, real-time data through digital technologies will become a fundamental part of the program,” said Eric Dishman, director of the All of Us Research Program. “This information, in combination with many other data types, will give us an unprecedented ability to better understand the impact of lifestyle and environment on health outcomes and, ultimately, develop better strategies for keeping people healthy in a very precise, individualized way.”…

All of Us is developing additional plans to incorporate digital health technologies. A second project with Fitbit is expected to launch later in the year. It will include providing devices to a limited number of All of Us participants who will be randomly invited to take part, to enable them to share wearable data with the program. And All of Us will add connections to other devices and apps in the future to further expand data collection efforts and engage participants in new ways….(More)”.

The promises — and challenges — of data collaboratives for the SDGs


Paula Hidalgo-Sanchis and Stefaan G. Verhulst at Devex: “As the road to achieving the Sustainable Development Goals becomes more complex and challenging, policymakers around the world need both new solutions and new ways to become more innovative. This includes better policy and program design based on evidence to solve problems at scale. The use of big data — the vast majority of which is collected, processed, and analyzed by the private sector — is key.

In the past few months, we at UN Global Pulse and The GovLab have sought to understand pathways to make policymaking more evidence-based and data-driven with the use of big data. Working in parallel at both local and global scale, we have conducted extensive desk research, held a series of workshops, and conducted in-depth conversations and interviews with key stakeholders, including government, civil society, and private sector representatives.

Our work is driven by a recognition of the potential of use of privately processed data through data collaboratives — a new form of public-private partnership in which government, private industry, and civil society work together to release previously siloed data, making it available to address the challenges of our era.

Research suggests that data collaboratives offer tremendous potential when implemented strategically under the appropriate policy and ethical frameworks. Nonetheless, this remains a nascent field, and we have summarized some of the barriers that continue to confront data collaboratives, with an eye toward ultimately proposing solutions to make them more effective, scalable, sustainable, and responsible.

Here are seven challenges…(More)”.

Data Policy in the Fourth Industrial Revolution: Insights on personal data


Report by the World Economic Forum: “Development of comprehensive data policy necessarily involves trade-offs. Cross-border data flows are crucial to the digital economy. The use of data is critical to innovation and technology. However, to engender trust, we need to have appropriate levels of protection in place to ensure privacy, security and safety. Over 120 laws in effect across the globe today provide differing levels of protection for data but few anticipated 

Data Policy in the Fourth Industrial Revolution: Insights on personal data, a paper by the World Economic Forum in collaboration with the Ministry of Cabinet Affairs and the Future, United Arab Emirates, examines the relationship between risk and benefit, recognizing the impact of culture, values and social norms This work is a start toward developing a comprehensive data policy toolkit and knowledge repository of case studies for policy makers and data policy leaders globally….(More)”.

A Research Roadmap to Advance Data Collaboratives Practice as a Novel Research Direction


Iryna Susha, Theresa A. Pardo, Marijn Janssen, Natalia Adler, Stefaan G. Verhulst and Todd Harbour in the  International Journal of Electronic Government Research (IJEGR): “An increasing number of initiatives have emerged around the world to help facilitate data sharing and collaborations to leverage different sources of data to address societal problems. They are called “data collaboratives”. Data collaboratives are seen as a novel way to match real life problems with relevant expertise and data from across the sectors. Despite its significance and growing experimentation by practitioners, there has been limited research in this field. In this article, the authors report on the outcomes of a panel discussing critical issues facing data collaboratives and develop a research and development agenda. The panel included participants from the government, academics, and practitioners and was held in June 2017 during the 18th International Conference on Digital Government Research at City University of New York (Staten Island, New York, USA). The article begins by discussing the concept of data collaboratives. Then the authors formulate research questions and topics for the research roadmap based on the panel discussions. The research roadmap poses questions across nine different topics: conceptualizing data collaboratives, value of data, matching data to problems, impact analysis, incentives, capabilities, governance, data management, and interoperability. Finally, the authors discuss how digital government research can contribute to answering some of the identified research questions….(More)”. See also: http://datacollaboratives.org/

Google Searches Could Predict Heroin Overdoses


Rod McCullom at Scientific American: “About 115 people nationwide die every day from opioid overdoses, according to the U.S. Centers for Disease Control and Prevention. A lack of timely, granular data exacerbates the crisis; one study showed opioid deaths were undercounted by as many as 70,000 between 1999 and 2015, making it difficult for governments to respond. But now Internet searches have emerged as a data source to predict overdose clusters in cities or even specific neighborhoods—information that could aid local interventions that save lives. 

The working hypothesis was that some people searching for information on heroin and other opioids might overdose in the near future. To test this, a researcher at the University of California Institute for Prediction Technology (UCIPT) and his colleagues developed several statistical models to forecast overdoses based on opioid-related keywords, metropolitan income inequality and total number of emergency room visits. They discovered regional differences (graphic) in where and how people searched for such information and found that more overdoses were associated with a greater number of searches per keyword. The best-fitting model, the researchers say, explained about 72 percent of the relation between the most popular search terms and heroin-related E.R. visits. The authors say their study, published in the September issue of Drug and Alcohol Dependence, is the first report of using Google searches in this way. 

To develop their models, the researchers obtained search data for 12 prescription and nonprescription opioids between 2005 and 2011 in nine U.S. metropolitan areas. They compared these with Substance Abuse and Mental Health Services Administration records of heroin-related E.R. admissions during the same period. The models can be modified to predict overdoses of other opioids or narrow searches to specific zip codes, says lead study author Sean D. Young, a behavioral psychologist and UCIPT executive director. That could provide early warnings of overdose clusters and help to decide where to distribute the overdose reversal medication Naloxone….(More)”.

It’s time for a Bill of Data Rights


Article by Martin Tisne: “…The proliferation of data in recent decades has led some reformers to a rallying cry: “You own your data!” Eric Posner of the University of Chicago, Eric Weyl of Microsoft Research, and virtual-reality guru Jaron Lanier, among others, argue that data should be treated as a possession. Mark Zuckerberg, the founder and head of Facebook, says so as well. Facebook now says that you “own all of the contact and information you post on Facebook” and “can control how it is shared.” The Financial Times argues that “a key part of the answer lies in giving consumers ownership of their own personal data.” In a recent speech, Tim Cook, Apple’s CEO, agreed, saying, “Companies should recognize that data belongs to users.”

This essay argues that “data ownership” is a flawed, counterproductive way of thinking about data. It not only does not fix existing problems; it creates new ones. Instead, we need a framework that gives people rights to stipulate how their data is used without requiring them to take ownership of it themselves….

The notion of “ownership” is appealing because it suggests giving you power and control over your data. But owning and “renting” out data is a bad analogy. Control over how particular bits of data are used is only one problem among many. The real questions are questions about how data shapes society and individuals. Rachel’s story will show us why data rights are important and how they might work to protect not just Rachel as an individual, but society as a whole.

Tomorrow never knows

To see why data ownership is a flawed concept, first think about this article you’re reading. The very act of opening it on an electronic device created data—an entry in your browser’s history, cookies the website sent to your browser, an entry in the website’s server log to record a visit from your IP address. It’s virtually impossible to do anything online—reading, shopping, or even just going somewhere with an internet-connected phone in your pocket—without leaving a “digital shadow” behind. These shadows cannot be owned—the way you own, say, a bicycle—any more than can the ephemeral patches of shade that follow you around on sunny days.

Your data on its own is not very useful to a marketer or an insurer. Analyzed in conjunction with similar data from thousands of other people, however, it feeds algorithms and bucketizes you (e.g., “heavy smoker with a drink habit” or “healthy runner, always on time”). If an algorithm is unfair—if, for example, it wrongly classifies you as a health risk because it was trained on a skewed data set or simply because you’re an outlier—then letting you “own” your data won’t make it fair. The only way to avoid being affected by the algorithm would be to never, ever give anyone access to your data. But even if you tried to hoard data that pertains to you, corporations and governments with access to large amounts of data about other people could use that data to make inferences about you. Data is not a neutral impression of reality. The creation and consumption of data reflects how power is distributed in society. …(More)”.

Creating value through data collaboratives


Paper by  Klievink, Bram, van der Voort, Haiko and Veeneman, Wijnand: “Driven by the technological capabilities that ICTs offer, data enable new ways to generate value for both society and the parties that own or offer the data. This article looks at the idea of data collaboratives as a form of cross-sector partnership to exchange and integrate data and data use to generate public value. The concept thereby bridges data-driven value creation and collaboration, both current themes in the field.

To understand how data collaboratives can add value in a public governance context, we exploratively studied the qualitative longitudinal case of an infomobility platform. We investigated the ability of a data collaborative to produce results while facing significant challenges and tensions between the goals of parties, each having the conflicting objectives of simultaneously retaining control whilst allowing for generativity. Taken together, the literature and case study findings help us to understand the emergence and viability of data collaboratives. Although limited by this study’s explorative nature, we find that conditions such as prior history of collaboration and supportive rules of the game are key to the emergence of collaboration. Positive feedback between trust and the collaboration process can institutionalise the collaborative, which helps it survive if conditions change for the worse….(More)”.

On the privacy-conscientious use of mobile phone data


Yves-Alexandre de Montjoye et al in Nature: “The breadcrumbs we leave behind when using our mobile phones—who somebody calls, for how long, and from where—contain unprecedented insights about us and our societies. Researchers have compared the recent availability of large-scale behavioral datasets, such as the ones generated by mobile phones, to the invention of the microscope, giving rise to the new field of computational social science.

With mobile phone penetration rates reaching 90% and under-resourced national statistical agencies, the data generated by our phones—traditional Call Detail Records (CDR) but also high-frequency x-Detail Record (xDR)—have the potential to become a primary data source to tackle crucial humanitarian questions in low- and middle-income countries. For instance, they have already been used to monitor population displacement after disasters, to provide real-time traffic information, and to improve our understanding of the dynamics of infectious diseases. These data are also used by governmental and industry practitioners in high-income countries.

While there is little doubt on the potential of mobile phone data for good, these data contain intimate details of our lives: rich information about our whereabouts, social life, preferences, and potentially even finances. A BCG study showed, e.g., that 60% of Americans consider location data and phone number history—both available in mobile phone data—as “private”.

Historically and legally, the balance between the societal value of statistical data (in aggregate) and the protection of privacy of individuals has been achieved through data anonymization. While hundreds of different anonymization algorithms exist, most of them are variations and improvements of the seminal k-anonymity algorithm introduced in 1998. Recent studies have, however, shown that pseudonymization and standard de-identification are not sufficient to prevent users from being re-identified in mobile phone data. Four data points—approximate places and times where an individual was present—have been shown to be enough to uniquely re-identify them 95% of the time in a mobile phone dataset of 1.5 million people. Furthermore, re-identification estimations using unicity—a metric to evaluate the risk of re-identification in large-scale datasets—and attempts at k-anonymizing mobile phone data ruled out de-identification as sufficient to truly anonymize the data. This was echoed in the recent report of the [US] President’s Council of Advisors on Science and Technology on Big Data Privacy which consider de-identification to be useful as an “added safeguard, but [emphasized that] it is not robust against near-term future re-identification methods”.

The limits of the historical de-identification framework to adequately balance risks and benefits in the use of mobile phone data are a major hindrance to their use by researchers, development practitioners, humanitarian workers, and companies. This became particularly clear at the height of the Ebola crisis, when qualified researchers (including some of us) were prevented from accessing relevant mobile phone data on time despite efforts by mobile phone operators, the GSMA, and UN agencies, with privacy being cited as one of the main concerns.

These privacy concerns are, in our opinion, due to the failures of the traditional de-identification model and the lack of a modern and agreed upon framework for the privacy-conscientious use of mobile phone data by third-parties especially in the context of the EU General Data Protection Regulation (GDPR). Such frameworks have been developed for the anonymous use of other sensitive data such as census, household survey, and tax data. The positive societal impact of making these data accessible and the technical means available to protect people’s identity have been considered and a trade-off, albeit far from perfect, has been agreed on and implemented. This has allowed the data to be used in aggregate for the benefit of society. Such thinking and an agreed upon set of models has been missing so far for mobile phone data. This has left data protection authorities, mobile phone operators, and data users with little guidance on technically sound yet reasonable models for the privacy-conscientious use of mobile phone data. This has often resulted in suboptimal tradeoffs if any.

In this paper, we propose four models for the privacy-conscientious use of mobile phone data (Fig. 1). All of these models 1) focus on a use of mobile phone data in which only statistical, aggregate information is ultimately needed by a third-party and, while this needs to be confirmed on a per-country basis, 2) are designed to fall under the legal umbrella of “anonymous use of the data”. Examples of cases in which only statistical aggregated information is ultimately needed by the third-party are discussed below. They would include, e.g., disaster management, mobility analysis, or the training of AI algorithms in which only aggregate information on people’s mobility is ultimately needed by agencies, and exclude cases in which individual-level identifiable information is needed such as targeted advertising or loans based on behavioral data.

Figure 1
Figure 1: Matrix of the four models for the privacy-conscientious use of mobile phone data.

First, it is important to insist that none of these models is a silver bullet…(More)”.