The promises — and challenges — of data collaboratives for the SDGs


Paula Hidalgo-Sanchis and Stefaan G. Verhulst at Devex: “As the road to achieving the Sustainable Development Goals becomes more complex and challenging, policymakers around the world need both new solutions and new ways to become more innovative. This includes better policy and program design based on evidence to solve problems at scale. The use of big data — the vast majority of which is collected, processed, and analyzed by the private sector — is key.

In the past few months, we at UN Global Pulse and The GovLab have sought to understand pathways to make policymaking more evidence-based and data-driven with the use of big data. Working in parallel at both local and global scale, we have conducted extensive desk research, held a series of workshops, and conducted in-depth conversations and interviews with key stakeholders, including government, civil society, and private sector representatives.

Our work is driven by a recognition of the potential of use of privately processed data through data collaboratives — a new form of public-private partnership in which government, private industry, and civil society work together to release previously siloed data, making it available to address the challenges of our era.

Research suggests that data collaboratives offer tremendous potential when implemented strategically under the appropriate policy and ethical frameworks. Nonetheless, this remains a nascent field, and we have summarized some of the barriers that continue to confront data collaboratives, with an eye toward ultimately proposing solutions to make them more effective, scalable, sustainable, and responsible.

Here are seven challenges…(More)”.

Data Policy in the Fourth Industrial Revolution: Insights on personal data


Report by the World Economic Forum: “Development of comprehensive data policy necessarily involves trade-offs. Cross-border data flows are crucial to the digital economy. The use of data is critical to innovation and technology. However, to engender trust, we need to have appropriate levels of protection in place to ensure privacy, security and safety. Over 120 laws in effect across the globe today provide differing levels of protection for data but few anticipated 

Data Policy in the Fourth Industrial Revolution: Insights on personal data, a paper by the World Economic Forum in collaboration with the Ministry of Cabinet Affairs and the Future, United Arab Emirates, examines the relationship between risk and benefit, recognizing the impact of culture, values and social norms This work is a start toward developing a comprehensive data policy toolkit and knowledge repository of case studies for policy makers and data policy leaders globally….(More)”.

A Research Roadmap to Advance Data Collaboratives Practice as a Novel Research Direction


Iryna Susha, Theresa A. Pardo, Marijn Janssen, Natalia Adler, Stefaan G. Verhulst and Todd Harbour in the  International Journal of Electronic Government Research (IJEGR): “An increasing number of initiatives have emerged around the world to help facilitate data sharing and collaborations to leverage different sources of data to address societal problems. They are called “data collaboratives”. Data collaboratives are seen as a novel way to match real life problems with relevant expertise and data from across the sectors. Despite its significance and growing experimentation by practitioners, there has been limited research in this field. In this article, the authors report on the outcomes of a panel discussing critical issues facing data collaboratives and develop a research and development agenda. The panel included participants from the government, academics, and practitioners and was held in June 2017 during the 18th International Conference on Digital Government Research at City University of New York (Staten Island, New York, USA). The article begins by discussing the concept of data collaboratives. Then the authors formulate research questions and topics for the research roadmap based on the panel discussions. The research roadmap poses questions across nine different topics: conceptualizing data collaboratives, value of data, matching data to problems, impact analysis, incentives, capabilities, governance, data management, and interoperability. Finally, the authors discuss how digital government research can contribute to answering some of the identified research questions….(More)”. See also: http://datacollaboratives.org/

Google Searches Could Predict Heroin Overdoses


Rod McCullom at Scientific American: “About 115 people nationwide die every day from opioid overdoses, according to the U.S. Centers for Disease Control and Prevention. A lack of timely, granular data exacerbates the crisis; one study showed opioid deaths were undercounted by as many as 70,000 between 1999 and 2015, making it difficult for governments to respond. But now Internet searches have emerged as a data source to predict overdose clusters in cities or even specific neighborhoods—information that could aid local interventions that save lives. 

The working hypothesis was that some people searching for information on heroin and other opioids might overdose in the near future. To test this, a researcher at the University of California Institute for Prediction Technology (UCIPT) and his colleagues developed several statistical models to forecast overdoses based on opioid-related keywords, metropolitan income inequality and total number of emergency room visits. They discovered regional differences (graphic) in where and how people searched for such information and found that more overdoses were associated with a greater number of searches per keyword. The best-fitting model, the researchers say, explained about 72 percent of the relation between the most popular search terms and heroin-related E.R. visits. The authors say their study, published in the September issue of Drug and Alcohol Dependence, is the first report of using Google searches in this way. 

To develop their models, the researchers obtained search data for 12 prescription and nonprescription opioids between 2005 and 2011 in nine U.S. metropolitan areas. They compared these with Substance Abuse and Mental Health Services Administration records of heroin-related E.R. admissions during the same period. The models can be modified to predict overdoses of other opioids or narrow searches to specific zip codes, says lead study author Sean D. Young, a behavioral psychologist and UCIPT executive director. That could provide early warnings of overdose clusters and help to decide where to distribute the overdose reversal medication Naloxone….(More)”.

It’s time for a Bill of Data Rights


Article by Martin Tisne: “…The proliferation of data in recent decades has led some reformers to a rallying cry: “You own your data!” Eric Posner of the University of Chicago, Eric Weyl of Microsoft Research, and virtual-reality guru Jaron Lanier, among others, argue that data should be treated as a possession. Mark Zuckerberg, the founder and head of Facebook, says so as well. Facebook now says that you “own all of the contact and information you post on Facebook” and “can control how it is shared.” The Financial Times argues that “a key part of the answer lies in giving consumers ownership of their own personal data.” In a recent speech, Tim Cook, Apple’s CEO, agreed, saying, “Companies should recognize that data belongs to users.”

This essay argues that “data ownership” is a flawed, counterproductive way of thinking about data. It not only does not fix existing problems; it creates new ones. Instead, we need a framework that gives people rights to stipulate how their data is used without requiring them to take ownership of it themselves….

The notion of “ownership” is appealing because it suggests giving you power and control over your data. But owning and “renting” out data is a bad analogy. Control over how particular bits of data are used is only one problem among many. The real questions are questions about how data shapes society and individuals. Rachel’s story will show us why data rights are important and how they might work to protect not just Rachel as an individual, but society as a whole.

Tomorrow never knows

To see why data ownership is a flawed concept, first think about this article you’re reading. The very act of opening it on an electronic device created data—an entry in your browser’s history, cookies the website sent to your browser, an entry in the website’s server log to record a visit from your IP address. It’s virtually impossible to do anything online—reading, shopping, or even just going somewhere with an internet-connected phone in your pocket—without leaving a “digital shadow” behind. These shadows cannot be owned—the way you own, say, a bicycle—any more than can the ephemeral patches of shade that follow you around on sunny days.

Your data on its own is not very useful to a marketer or an insurer. Analyzed in conjunction with similar data from thousands of other people, however, it feeds algorithms and bucketizes you (e.g., “heavy smoker with a drink habit” or “healthy runner, always on time”). If an algorithm is unfair—if, for example, it wrongly classifies you as a health risk because it was trained on a skewed data set or simply because you’re an outlier—then letting you “own” your data won’t make it fair. The only way to avoid being affected by the algorithm would be to never, ever give anyone access to your data. But even if you tried to hoard data that pertains to you, corporations and governments with access to large amounts of data about other people could use that data to make inferences about you. Data is not a neutral impression of reality. The creation and consumption of data reflects how power is distributed in society. …(More)”.

Creating value through data collaboratives


Paper by  Klievink, Bram, van der Voort, Haiko and Veeneman, Wijnand: “Driven by the technological capabilities that ICTs offer, data enable new ways to generate value for both society and the parties that own or offer the data. This article looks at the idea of data collaboratives as a form of cross-sector partnership to exchange and integrate data and data use to generate public value. The concept thereby bridges data-driven value creation and collaboration, both current themes in the field.

To understand how data collaboratives can add value in a public governance context, we exploratively studied the qualitative longitudinal case of an infomobility platform. We investigated the ability of a data collaborative to produce results while facing significant challenges and tensions between the goals of parties, each having the conflicting objectives of simultaneously retaining control whilst allowing for generativity. Taken together, the literature and case study findings help us to understand the emergence and viability of data collaboratives. Although limited by this study’s explorative nature, we find that conditions such as prior history of collaboration and supportive rules of the game are key to the emergence of collaboration. Positive feedback between trust and the collaboration process can institutionalise the collaborative, which helps it survive if conditions change for the worse….(More)”.

On the privacy-conscientious use of mobile phone data


Yves-Alexandre de Montjoye et al in Nature: “The breadcrumbs we leave behind when using our mobile phones—who somebody calls, for how long, and from where—contain unprecedented insights about us and our societies. Researchers have compared the recent availability of large-scale behavioral datasets, such as the ones generated by mobile phones, to the invention of the microscope, giving rise to the new field of computational social science.

With mobile phone penetration rates reaching 90% and under-resourced national statistical agencies, the data generated by our phones—traditional Call Detail Records (CDR) but also high-frequency x-Detail Record (xDR)—have the potential to become a primary data source to tackle crucial humanitarian questions in low- and middle-income countries. For instance, they have already been used to monitor population displacement after disasters, to provide real-time traffic information, and to improve our understanding of the dynamics of infectious diseases. These data are also used by governmental and industry practitioners in high-income countries.

While there is little doubt on the potential of mobile phone data for good, these data contain intimate details of our lives: rich information about our whereabouts, social life, preferences, and potentially even finances. A BCG study showed, e.g., that 60% of Americans consider location data and phone number history—both available in mobile phone data—as “private”.

Historically and legally, the balance between the societal value of statistical data (in aggregate) and the protection of privacy of individuals has been achieved through data anonymization. While hundreds of different anonymization algorithms exist, most of them are variations and improvements of the seminal k-anonymity algorithm introduced in 1998. Recent studies have, however, shown that pseudonymization and standard de-identification are not sufficient to prevent users from being re-identified in mobile phone data. Four data points—approximate places and times where an individual was present—have been shown to be enough to uniquely re-identify them 95% of the time in a mobile phone dataset of 1.5 million people. Furthermore, re-identification estimations using unicity—a metric to evaluate the risk of re-identification in large-scale datasets—and attempts at k-anonymizing mobile phone data ruled out de-identification as sufficient to truly anonymize the data. This was echoed in the recent report of the [US] President’s Council of Advisors on Science and Technology on Big Data Privacy which consider de-identification to be useful as an “added safeguard, but [emphasized that] it is not robust against near-term future re-identification methods”.

The limits of the historical de-identification framework to adequately balance risks and benefits in the use of mobile phone data are a major hindrance to their use by researchers, development practitioners, humanitarian workers, and companies. This became particularly clear at the height of the Ebola crisis, when qualified researchers (including some of us) were prevented from accessing relevant mobile phone data on time despite efforts by mobile phone operators, the GSMA, and UN agencies, with privacy being cited as one of the main concerns.

These privacy concerns are, in our opinion, due to the failures of the traditional de-identification model and the lack of a modern and agreed upon framework for the privacy-conscientious use of mobile phone data by third-parties especially in the context of the EU General Data Protection Regulation (GDPR). Such frameworks have been developed for the anonymous use of other sensitive data such as census, household survey, and tax data. The positive societal impact of making these data accessible and the technical means available to protect people’s identity have been considered and a trade-off, albeit far from perfect, has been agreed on and implemented. This has allowed the data to be used in aggregate for the benefit of society. Such thinking and an agreed upon set of models has been missing so far for mobile phone data. This has left data protection authorities, mobile phone operators, and data users with little guidance on technically sound yet reasonable models for the privacy-conscientious use of mobile phone data. This has often resulted in suboptimal tradeoffs if any.

In this paper, we propose four models for the privacy-conscientious use of mobile phone data (Fig. 1). All of these models 1) focus on a use of mobile phone data in which only statistical, aggregate information is ultimately needed by a third-party and, while this needs to be confirmed on a per-country basis, 2) are designed to fall under the legal umbrella of “anonymous use of the data”. Examples of cases in which only statistical aggregated information is ultimately needed by the third-party are discussed below. They would include, e.g., disaster management, mobility analysis, or the training of AI algorithms in which only aggregate information on people’s mobility is ultimately needed by agencies, and exclude cases in which individual-level identifiable information is needed such as targeted advertising or loans based on behavioral data.

Figure 1
Figure 1: Matrix of the four models for the privacy-conscientious use of mobile phone data.

First, it is important to insist that none of these models is a silver bullet…(More)”.

Data Flow in the Smart City: Open Data Versus the Commons


Chapter by Richard Beckwith, John Sherry and David Prendergast in The Hackable City: “Much of the recent excitement around data, especially ‘Big Data,’ focuses on the potential commercial or economic value of data. How that data will affect people isn’t much discussed. People know that smart cities will deploy Internet-based monitoring and that flows of the collected data promise to produce new values. Less considered is that smart cities will be sites of new forms of citizen action—enabled by an ‘economy’ of data that will lead to new methods of collectivization, accountability, and control which, themselves, can provide both positive and negative values to the citizenry. Therefore, smart city design needs to consider not just measurement and publication of data but also the implications of city-wide deployment, data openness, and the possibility of unintended consequences if data leave the city….(More)”.

Data Collaboration, Pooling and Hoarding under Competition Law


Paper by Bjorn Lundqvist: “In the Internet of Things era devices will monitor and collect data, whilst device producing firms will store, distribute, analyse and re-use data on a grand scale. Great deal of data analytics will be used to enable firms to understand and make use of the collected data. The infrastructure around the collected data is controlled and access to the data flow is thus restricted on technical, but also on legal grounds. Legally, the data are being obscured behind a thicket of property rights, including intellectual property rights. Therefore, there is no general “data commons” for everyone to enjoy.

If firms would like to combine data, they need to give each other access either by sharing, trading, or pooling the data. On the one hand, industry-wide pooling of data could increase efficiency of certain services, and contribute to the innovation of other services, e.g., think about self-driven cars or personalized medicine. On the other hand, firms combining business data may use the data, not to advance their services or products, but to collude, to exclude competitors or to abuse their market position. Indeed by combining their data in a pool, they can gain market power, and, hence, the ability to violate competition law. Moreover, we also see firms hoarding data from various source creating de facto data pools. This article will discuss what implications combining data in data pools by firms might have on competition, and when competition law should be applicable. It develops the idea that data pools harbour great opportunities, whilst acknowledging that there are still risks to take into consideration, and to regulate….(More)”.

Using Mobile Network Data for Development: How it works


Blog by Derval Usher and Darren Hanniffy: “…We aim to equip decision makers with data tools so that they have access to the analysis on the fly. But to help this scale we need progress in three areas:

1. The framework to support Shared Value partnerships.

2. Shared understanding of The Proposition and the benefits for all parties.

3. Access to finance and a funding strategy, designing-in innovation.

1. Any Public-Private Partnership should be aligned to achieve impact centered on the SDGs through a Shared Value / Inclusive Business approach. Mobile network operators are consumed with the challenge of maintaining or upgrading their infrastructure, driving device sales and sustaining their agent networks to reach the last mile. Measuring impact against the SDGs has not been a priority. Mobile network operators tend not to seek out partnerships with traditional development donors or development implementers. But there is a growing realisation of the potential and the need to partner. It’s important to move from a service level transactional relationship to a strategic partnership approach.

Private sector partners have been fundamental to the success of UN Global Pulse as these companies are often the custodians of the big data sets from which we develop valuable development and humanitarian insights. Although in previous years our private sector partners were framed primarily as data philanthropists, we are beginning to see a shift in the relationship to one of shared value. Our work generates public value and also insights that can enhance business operations. This shared value model is attracting more private enterprises to engage and to explore their own data, and more broadly to investigate the value of their networks and data as part of the data innovation ecosystem, which the Global Pulse lab network will build on as we move forward.

2. Partners need to be more propositional and less charitable. They need to recognise the fact that earning profit may help ensure the sustainability of digital platforms and services that offer developmental impact. Through partnership we can attract innovative finance, deliver mobile for development programmes, measure impact and create affordable commercial solutions to development challenges that become sustainable by design. Pulse Lab Jakarta and Digicel have been flexible with one another which is important as this partnership has not always been a priority for either side all the time. But we believe in unlocking the power of mobile data for development and therefore continue to make progress.

3. Development and commercial strategies should be more aligned to create an enabling environment. Currently they are not. Private sector needs to become a strategic partner to development where multi-annual development funds align with commercial strategy. Mobile network operators continue to invest in their network particularly in developing countries and the digital platform is coming into being in the markets where Digicel operates. But the platform is new and experience is limited within governments, the development community and indeed even within mobile network operators.

We need to see donors actively engage during the development of multi-annual funding facilities….(More)”.