Data Collaboratives as a New Frontier of Cross-Sector Partnerships in the Age of Open Data: Taxonomy Development

Paper by Iryna Susha, Marijn Janssen and Stefaan Verhulst: “Data collaboratives present a new form of cross-sector and public-private partnership to leverage (often corporate) data for addressing a societal challenge. They can be seen as the latest attempt to make data accessible to solve public problems. Although an increasing number of initiatives can be found, there is hardly any analysis of these emerging practices. This paper seeks to develop a taxonomy of forms of data collaboratives. The taxonomy consists of six dimensions related to data sharing and eight dimensions related to data use. Our analysis shows that data collaboratives exist in a variety of models. The taxonomy can help organizations to find a suitable form when shaping their efforts to create public value from corporate and other data. The use of data is not only dependent on the organizational arrangement, but also on aspects like the type of policy problem, incentives for use, and the expected outcome of data collaborative….(More)”

The Centre for Humanitarian Data

Centre for HumData: “The United Nations Office for the Coordination of Humanitarian Affairs (OCHA) is establishing a Centre for Humanitarian Data in the Netherlands. It will be operational by early 2017 for an initial three years.

The Centre’s mission is to increase the use and impact of data in the humanitarian sector. The vision is to create a future where all people involved in a humanitarian situation have access to the data they need, when and how they need it, to make responsible and informed decisions.

The Centre will support humanitarian partners and OCHA staff in the field and at headquarters with their data efforts. It will be part of the city of The Hague’s Humanity Hub, a dedicated building for organizations working on data and innovation in the social sector. The location offers OCHA and partners a new, neutral setting where a hybrid culture can be created around data collaboration.

The Centre is a key contribution towards the Secretary-General’s Agenda for Humanity under core commitment four — changing the way we work to end need. The Centre’s activities will accelerate the changes required for the humanitarian system to become data driven….(More)”

The ethical impact of data science

Theme issue of Phil. Trans. R. Soc. A compiled and edited by Mariarosaria Taddeo and Luciano Floridi: “This theme issue has the founding ambition of landscaping data ethics as a new branch of ethics that studies and evaluates moral problems related to data (including generation, recording, curation, processing, dissemination, sharing and use), algorithms (including artificial intelligence, artificial agents, machine learning and robots) and corresponding practices (including responsible innovation, programming, hacking and professional codes), in order to formulate and support morally good solutions (e.g. right conducts or right values). Data ethics builds on the foundation provided by computer and information ethics but, at the same time, it refines the approach endorsed so far in this research field, by shifting the level of abstraction of ethical enquiries, from being information-centric to being data-centric. This shift brings into focus the different moral dimensions of all kinds of data, even data that never translate directly into information but can be used to support actions or generate behaviours, for example. It highlights the need for ethical analyses to concentrate on the content and nature of computational operations—the interactions among hardware, software and data—rather than on the variety of digital technologies that enable them. And it emphasizes the complexity of the ethical challenges posed by data science. Because of such complexity, data ethics should be developed from the start as a macroethics, that is, as an overall framework that avoids narrow, ad hoc approaches and addresses the ethical impact and implications of data science and its applications within a consistent, holistic and inclusive framework. Only as a macroethics will data ethics provide solutions that can maximize the value of data science for our societies, for all of us and for our environments….(More)”

Table of Contents:

  • The dynamics of big data and human rights: the case of scientific research; Effy Vayena, John Tasioulas
  • Facilitating the ethical use of health data for the benefit of society: electronic health records, consent and the duty of easy rescue; Sebastian Porsdam Mann, Julian Savulescu, Barbara J. Sahakian
  • Faultless responsibility: on the nature and allocation of moral responsibility for distributed moral actions; Luciano Floridi
  • Compelling truth: legal protection of the infosphere against big data spills; Burkhard Schafer
  • Locating ethics in data science: responsibility and accountability in global and distributed knowledge production systems; Sabina Leonelli
  • Privacy is an essentially contested concept: a multi-dimensional analytic for mapping privacy; Deirdre K. Mulligan, Colin Koopman, Nick Doty
  • Beyond privacy and exposure: ethical issues within citizen-facing analytics; Peter Grindrod
  • The ethics of smart cities and urban science; Rob Kitchin
  • The ethics of big data as a public good: which public? Whose good? Linnet Taylor
  • Data philanthropy and the design of the infraethics for information societies; Mariarosaria Taddeo
  • The opportunities and ethics of big data: practical priorities for a national Council of Data Ethics; Olivia Varley-Winter, Hetan Shah
  • Data science ethics in government; Cat Drew
  • The ethics of data and of data science: an economist’s perspective; Jonathan Cave
  • What’s the good of a science platform? John Gallacher


How Companies Can Help Cities Close the Data Gap

Shamina Singh in Governing: “Recent advances in data analytics have revolutionized the way many companies do business. Starbucks, for example, rolls out new beverages and chooses its store locations by analyzing customer, economic and other data. And as Amazon’s customers know so well, the company makes purchase recommendations to them in real time based on items they’ve viewed or bought. So why aren’t more of our cities leveraging data in the same way to improve services for their residents?

According to a recent report by Bloomberg Philanthropies’ What Works Cities initiative, city officials say they simply lack the capacity to do so. Nearly half pointed to a shortage of staff and financial resources dedicated to gathering and evaluating data.

This gap between companies’ and cities’ ability to use data is not surprising. Businesses have invested heavily in data and analytics in recent years, and they are spending an average of $7 million annually per company on data-related activities. These investments are made with the understanding that they will improve the companies’ bottom line, and they have started paying off.

City halls, on the other hand, find themselves hamstrung when it comes to investing in data and analytics. Despite recent growth, city revenues remain below pre-recession levels, with spending demands on the rise. Furthermore, many cities face the need to balance long-term opportunity with real short-term needs. Do you hire a data scientist — who may command a salary north of $200,000 — to research strategies to reduce crime in the long run, or do you hire more police officers to keep neighborhoods safe today?….

One way companies can help is through data philanthropy, leveraging their data analytics and capabilities to advance social progress. A step beyond conventional philanthropy and traditional corporate social-responsibility initiatives, data philanthropy is a new kind of response to social issues.

There are a number of ways cities could employ data philanthropy. For starters, they could partner with relevant apps to help ameliorate deteriorating roads. In Oklahoma City, for example, potholes are a particularly serious problem. Data from Waze, the community-based mapping and navigation app, could be leveraged to build a system through which residents could report potholes, allowing city services to efficiently fill them in.

Some data-philanthropy projects are already underway. Uber, for example, recently partnered with the city of Boston in the hopes that its data could help the city improve traffic congestion and community planning. Uber donates anonymized trip data by Zip code, allowing city officials to see the date and time of a trip, its duration and distance traveled. Boston’s transportation, neighborhood development and redevelopment agencies will have access to the data, equipping them with a new tool for more-effective policymaking.

While there is demonstrated enthusiasm from cities for more effective use of data to improve their residents’ lives, cities won’t be able to close the data gap on their own. Private-sector companies must answer the call. Helped in part by the better use of data, cities can create improved, more inclusive and stronger business environments. Who would argue with that goal?…(More)”

How Technology is Crowd-Sourcing the Fight Against Hunger

Beth Noveck at Media Planet: “There is more than enough food produced to feed everyone alive today. Yet access to nutritious food is a challenge everywhere and depends on getting every citizen involved, not just large organizations. Technology is helping to democratize and distribute the job of tackling the problem of hunger in America and around the world.

Real-time research

One of the hardest problems is the difficulty of gaining real-time insight into food prices and shortages. Enter technology. We no longer have to rely on professional inspectors slowly collecting information face-to-face. The UN World Food Programme, which provides food assistance to 80 million people each year, together with Nielsen is conducting mobile phone surveys in 15 countries (with plans to expand to 30), asking people by voice and text about what they are eating. Formerly blank maps are now filled in with information provided quickly and directly by the most affected people, making it easy to prioritize the allocation of resources.

Technology helps the information flow in both directions, enabling those in need to reach out, but also to become more effective at helping themselves. The Indian Ministry of Agriculture, in collaboration with Reuters Market Light, provides information services in nine Indian languages to 1.4 million registered farmers in 50,000 villages across 17 Indian states via text and voice messages.

“In the United States, 40 percent of the food produced here is wasted, and yet 1 in 4 American children (and 1 in 6 adults) remain food insecure…”

Data to the people

New open data laws and policies that encourage more transparent publication of public information complement data collection and dissemination technologies such as phones and tablets. About 70 countries and hundreds of regions and cities have adopted open data policies, which guarantee that the information these public institutions collect be available for free use by the public. As a result, there are millions of open datasets now online on websites such as the Humanitarian Data Exchange, which hosts 4,000 datasets such as country-by-country stats on food prices and undernourishment around the world.

Companies are compiling and sharing data to combat food insecurity, too. Anyone can dig into the data on the Global Open Data for Agriculture and Nutrition platform, a data collaborative where 300 private and public partners are sharing information.

Importantly, this vast quantity of open data is available to anyone, not only to governments. As a result, large and small entrepreneurs are able to create new apps and programs to combat food insecurity, such as Plantwise, which uses government data to offer a knowledge bank and run “plant clinics” that help farmers lose less of what they grow to pests. Google uses open government data to show people the location of farmers markets near their homes.

Students, too, can learn to play a role. For the second summer in a row, the Governance Lab at New York University, in partnership with the United States Department of Agriculture (USDA), mounted a two-week open data summer camp for 40 middle and high school students. The next generation of problem solvers is learning new data science skills by working on food safety and other projects using USDA open data.

Enhancing connection

Ultimately, technology enables greater communication and collaboration among the public, social service organizations, restaurants, farmers and other food producers who must work together to avoid food crises. The European Food Safety Authority in Italy has begun exploring how to use internet-based collaboration (often called citizen science or crowdsourcing) to get more people involved in food and feed risk assessment.

In the United States, 40 percent of the food produced here is wasted, and yet 1 in 4 American children (and 1 in 6 adults) remain food insecure, according to the Rockefeller Foundation. Copia, a San Francisco based smartphone app facilitates donations and deliveries of those with excess food in six cities in the Bay Area. Zero Percent in Chicago similarly attacks the distribution problem by connecting restaurants to charities to donate their excess food. Full Harvest is a tech platform that facilitates the selling of surplus produce that otherwise would not have a market.

Mobilizing the world

Prize-backed challenges create the incentives for more people to collaborate online and get involved in the fight against hunger….(More)”

Selected Readings on Data Collaboratives

By Neil Britto, David Sangokoya, Iryna Susha, Stefaan Verhulst and Andrew Young

The Living Library’s Selected Readings series seeks to build a knowledge base on innovative approaches for improving the effectiveness and legitimacy of governance. This curated and annotated collection of recommended works on the topic of data collaboratives was originally published in 2017.

The term data collaborative refers to a new form of collaboration, beyond the public-private partnership model, in which participants from different sectors (including private companies, research institutions, and government agencies ) can exchange data to help solve public problems. Several of society’s greatest challenges — from addressing climate change to public health to job creation to improving the lives of children — require greater access to data, more collaboration between public – and private-sector entities, and an increased ability to analyze datasets. In the coming months and years, data collaboratives will be essential vehicles for harnessing the vast stores of privately held data toward the public good.

Selected Reading List (in alphabetical order)

Annotated Selected Readings List (in alphabetical order)

Agaba, G., Akindès, F., Bengtsson, L., Cowls, J., Ganesh, M., Hoffman, N., . . . Meissner, F. “Big Data and Positive Social Change in the Developing World: A White Paper for Practitioners and Researchers.” 2014.

  • This white paper, produced by “a group of activists, researchers and data experts” explores the potential of big data to improve development outcomes and spur positive social change in low- and middle-income countries. Using examples, the authors discuss four areas in which the use of big data can impact development efforts:
    • Advocating and facilitating by “opening[ing] up new public spaces for discussion and awareness building;
    • Describing and predicting through the detection of “new correlations and the surfac[ing] of new questions;
    • Facilitating information exchange through “multiple feedback loops which feed into both research and action,” and
    • Promoting accountability and transparency, especially as a byproduct of crowdsourcing efforts aimed at “aggregat[ing] and analyz[ing] information in real time.
  • The authors argue that in order to maximize the potential of big data’s use in development, “there is a case to be made for building a data commons for private/public data, and for setting up new and more appropriate ethical guidelines.”
  • They also identify a number of challenges, especially when leveraging data made accessible from a number of sources, including private sector entities, such as:
    • Lack of general data literacy;
    • Lack of open learning environments and repositories;
    • Lack of resources, capacity and access;
    • Challenges of sensitivity and risk perception with regard to using data;
    • Storage and computing capacity; and
    • Externally validating data sources for comparison and verification.

Ansell, C. and Gash, A. “Collaborative Governance in Theory and Practice.” Journal of Public Administration Research and  Theory 18 (4), 2008.

  • This article describes collaborative arrangements that include public and private organizations working together and proposes a model for understanding an emergent form of public-private interaction informed by 137 diverse cases of collaborative governance.
  • The article suggests factors significant to successful partnering processes and outcomes include:
    • Shared understanding of challenges,
    • Trust building processes,
    • The importance of recognizing seemingly modest progress, and
    • Strong indicators of commitment to the partnership’s aspirations and process.
  • The authors provide a ‘’contingency theory model’’ that specifies relationships between different variables that influence outcomes of collaborative governance initiatives. Three “core contingencies’’ for successful collaborative governance initiatives identified by the authors are:
    • Time (e.g., decision making time afforded to the collaboration);
    • Interdependence (e.g., a high degree of interdependence can mitigate negative effects of low trust); and
    • Trust (e.g. a higher level of trust indicates a higher probability of success).

Ballivian A, Hoffman W. “Public-Private Partnerships for Data: Issues Paper for Data Revolution Consultation.” World Bank, 2015. Available from:

  • This World Bank report provides a background document on forming public-prviate partnerships for data with the private sector in order to inform the UN’s Independent Expert Advisory Group (IEAG) on sustaining a “data revolution” in sustainable development.
  • The report highlights the critical position of private companies within the data value chain and reflects on key elements of a sustainable data PPP: “common objectives across all impacted stakeholders, alignment of incentives, and sharing of risks.” In addition, the report describes the risks and incentives of public and private actors, and the principles needed to “build[ing] the legal, cultural, technological and economic infrastructures to enable the balancing of competing interests.” These principles include understanding; experimentation; adaptability; balance; persuasion and compulsion; risk management; and governance.
  • Examples of data collaboratives cited in the report include HP Earth Insights, Orange Data for Development Challenges, Amazon Web Services, IBM Smart Cities Initiative, and the Governance Lab’s Open Data 500.

Brack, Matthew, and Tito Castillo. “Data Sharing for Public Health: Key Lessons from Other Sectors.” Chatham House, Centre on Global Health Security. April 2015. Available from:

  • The Chatham House report provides an overview on public health surveillance data sharing, highlighting the benefits and challenges of shared health data and the complexity in adapting technical solutions from other sectors for public health.
  • The report describes data sharing processes from several perspectives, including in-depth case studies of actual data sharing in practice at the individual, organizational and sector levels. Among the key lessons for public health data sharing, the report strongly highlights the need to harness momentum for action and maintain collaborative engagement: “Successful data sharing communities are highly collaborative. Collaboration holds the key to producing and abiding by community standards, and building and maintaining productive networks, and is by definition the essence of data sharing itself. Time should be invested in establishing and sustaining collaboration with all stakeholders concerned with public health surveillance data sharing.”
  • Examples of data collaboratives include H3Africa (a collaboration between NIH and Wellcome Trust) and NHS England’s programme.

de Montjoye, Yves-Alexandre, Jake Kendall, and Cameron F. Kerry. “Enabling Humanitarian Use of Mobile Phone Data.” The Brookings Institution, Issues in Technology Innovation. November 2014. Available from:

  • Using Ebola as a case study, the authors describe the value of using private telecom data for uncovering “valuable insights into understanding the spread of infectious diseases as well as strategies into micro-target outreach and driving update of health-seeking behavior.”
  • The authors highlight the absence of a common legal and standards framework for “sharing mobile phone data in privacy-conscientious ways” and recommend “engaging companies, NGOs, researchers, privacy experts, and governments to agree on a set of best practices for new privacy-conscientious metadata sharing models.”

Eckartz, Silja M., Hofman, Wout J., Van Veenstra, Anne Fleur. “A decision model for data sharing.” Vol. 8653 LNCS. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). 2014.

  • This paper proposes a decision model for data sharing of public and private data based on literature review and three case studies in the logistics sector.
  • The authors identify five categories of the barriers to data sharing and offer a decision model for identifying potential interventions to overcome each barrier:
    • Ownership. Possible interventions likely require improving trust among those who own the data through, for example, involvement and support from higher management
    • Privacy. Interventions include “anonymization by filtering of sensitive information and aggregation of data,” and access control mechanisms built around identity management and regulated access.  
    • Economic. Interventions include a model where data is shared only with a few trusted organizations, and yield management mechanisms to ensure negative financial consequences are avoided.
    • Data quality. Interventions include identifying additional data sources that could improve the completeness of datasets, and efforts to improve metadata.
    • Technical. Interventions include making data available in structured formats and publishing data according to widely agreed upon data standards.

Hoffman, Sharona and Podgurski, Andy. “The Use and Misuse of Biomedical Data: Is Bigger Really Better?” American Journal of Law & Medicine 497, 2013.

  • This journal articles explores the benefits and, in particular, the risks related to large-scale biomedical databases bringing together health information from a diversity of sources across sectors. Some data collaboratives examined in the piece include:
    • MedMining – a company that extracts EHR data, de-identifies it, and offers it to researchers. The data sets that MedMining delivers to its customers include ‘lab results, vital signs, medications, procedures, diagnoses, lifestyle data, and detailed costs’ from inpatient and outpatient facilities.
    • Explorys has formed a large healthcare database derived from financial, administrative, and medical records. It has partnered with major healthcare organizations such as the Cleveland Clinic Foundation and Summa Health System to aggregate and standardize health information from ten million patients and over thirty billion clinical events.
  • Hoffman and Podgurski note that biomedical databases populated have many potential uses, with those likely to benefit including: “researchers, regulators, public health officials, commercial entities, lawyers,” as well as “healthcare providers who conduct quality assessment and improvement activities,” regulatory monitoring entities like the FDA, and “litigants in tort cases to develop evidence concerning causation and harm.”
  • They argue, however, that risks arise based on:
    • The data contained in biomedical databases is surprisingly likely to be incorrect or incomplete;
    • Systemic biases, arising from both the nature of the data and the preconceptions of investigators are serious threats the validity of research results, especially in answering causal questions;
  • Data mining of biomedical databases makes it easier for individuals with political, social, or economic agendas to generate ostensibly scientific but misleading research findings for the purpose of manipulating public opinion and swaying policymakers.

Krumholz, Harlan M., et al. “Sea Change in Open Science and Data Sharing Leadership by Industry.” Circulation: Cardiovascular Quality and Outcomes 7.4. 2014. 499-504.

  • This article provides a comprehensive overview of industry-led efforts and cross-sector collaborations in data sharing by pharmaceutical companies to inform clinical practice.
  • The article details the types of data being shared and the early activities of GlaxoSmithKline (“in coordination with other companies such as Roche and ViiV”); Medtronic and the Yale University Open Data Access Project; and Janssen Pharmaceuticals (Johnson & Johnson). The article also describes the range of involvement in data sharing among pharmaceutical companies including Pfizer, Novartis, Bayer, AbbVie, Eli Llly, AstraZeneca, and Bristol-Myers Squibb.

Mann, Gideon. “Private Data and the Public Good.” Medium. May 17, 2016.

    • This Medium post from Gideon Mann, the Head of Data Science at Bloomberg, shares his prepared remarks given at a lecture at the City College of New York. Mann argues for the potential benefits of increasing access to private sector data, both to improve research and academic inquiry and also to help solve practical, real-world problems. He also describes a number of initiatives underway at Bloomberg along these lines.    
  • Mann argues that data generated at private companies “could enable amazing discoveries and research,” but is often inaccessible to those who could put it to those uses. Beyond research, he notes that corporate data could, for instance, benefit:
      • Public health – including suicide prevention, addiction counseling and mental health monitoring.
    • Legal and ethical questions – especially as they relate to “the role algorithms have in decisions about our lives,” such as credit checks and resume screening.
  • Mann recognizes the privacy challenges inherent in private sector data sharing, but argues that it is a common misconception that the only two choices are “complete privacy or complete disclosure.” He believes that flexible frameworks for differential privacy could open up new opportunities for responsibly leveraging data collaboratives.

Pastor Escuredo, D., Morales-Guzmán, A. et al, “Flooding through the Lens of Mobile Phone Activity.” IEEE Global Humanitarian Technology Conference, GHTC 2014. Available from:

  • This report describes the impact of using mobile data in order to understand the impact of disasters and improve disaster management. The report was conducted in the Mexican state of Tabasco in 2009 as a multidisciplinary, multi-stakeholder consortium involving the UN World Food Programme (WFP), Telefonica Research, Technical University of Madrid (UPM), Digital Strategy Coordination Office of the President of Mexico, and UN Global Pulse.
  • Telefonica Research, a division of the major Latin American telecommunications company, provided call detail records covering flood-affected areas for nine months. This data was combined with “remote sensing data (satellite images), rainfall data, census and civil protection data.” The results of the data demonstrated that “analysing mobile activity during floods could be used to potentially locate damaged areas, efficiently assess needs and allocate resources (for example, sending supplies to affected areas).”
  • In addition to the results, the study highlighted “the value of a public-private partnership on using mobile data to accurately indicate flooding impacts in Tabasco, thus improving early warning and crisis management.”

* Perkmann, M. and Schildt, H. “Open data partnerships between firms and universities: The role of boundary organizations.” Research Policy, 44(5), 2015.

  • This paper discusses the concept of a “boundary organization” in relation to industry-academic partnerships driven by data. Boundary organizations perform mediated revealing, allowing firms to disclose their research problems to a broad audience of innovators and simultaneously minimize the risk that this information would be adversely used by competitors.
  • The authors identify two especially important challenges for private firms to enter open data or participate in data collaboratives with the academic research community that could be addressed through more involvement from boundary organizations:
    • First is a challenge of maintaining competitive advantage. The authors note that, “the more a firm attempts to align the efforts in an open data research programme with its R&D priorities, the more it will have to reveal about the problems it is addressing within its proprietary R&D.”
    • Second, involves the misalignment of incentives between the private and academic field. Perkmann and Schildt argue that, a firm seeking to build collaborations around its opened data “will have to provide suitable incentives that are aligned with academic scientists’ desire to be rewarded for their work within their respective communities.”

Robin, N., Klein, T., & Jütting, J. “Public-Private Partnerships for Statistics: Lessons Learned, Future Steps.” OECD. 2016.

  • This working paper acknowledges the growing body of work on how different types of data (e.g, telecom data, social media, sensors and geospatial data, etc.) can address data gaps relevant to National Statistical Offices (NSOs).
  • Four models of public-private interaction for statistics are describe: in-house production of statistics by a data-provider for a national statistics office (NSO), transfer of data-sets to NSOs from private entities, transfer of data to a third party provider to manage the NSO and private entity data, and the outsourcing of NSO functions.
  • The paper highlights challenges to public-private partnerships involving data (e.g., technical challenges, data confidentiality, risks, limited incentives for participation), suggests deliberate and highly structured approaches to public-private partnerships involving data require enforceable contracts, emphasizes the trade-off between data specificity and accessibility of such data, and the importance of pricing mechanisms that reflect the capacity and capability of national statistic offices.
  • Case studies referenced in the paper include:
    • A mobile network operator’s (MNO Telefonica) in house analysis of call detail records;
    • A third-party data provider and steward of travel statistics (Positium);
    • The Data for Development (D4D) challenge organized by MNO Orange; and
    • Statistics Netherlands use of social media to predict consumer confidence.

Stuart, Elizabeth, Samman, Emma, Avis, William, Berliner, Tom. “The data revolution: finding the missing millions.” Overseas Development Institute, 2015. Available from:

  • The authors of this report highlight the need for good quality, relevant, accessible and timely data for governments to extend services into underrepresented communities and implement policies towards a sustainable “data revolution.”
  • The solutions focused on this recent report from the Overseas Development Institute focus on capacity-building activities of national statistical offices (NSOs), alternative sources of data (including shared corporate data) to address gaps, and building strong data management systems.

Taylor, L., & Schroeder, R. “Is bigger better? The emergence of big data as a tool for international development policy.” GeoJournal, 80(4). 2015. 503-518.

  • This journal article describes how privately held data – namely “digital traces” of consumer activity – “are becoming seen by policymakers and researchers as a potential solution to the lack of reliable statistical data on lower-income countries.
  • They focus especially on three categories of data collaborative use cases:
    • Mobile data as a predictive tool for issues such as human mobility and economic activity;
    • Use of mobile data to inform humanitarian response to crises; and
    • Use of born-digital web data as a tool for predicting economic trends, and the implications these have for LMICs.
  • They note, however, that a number of challenges and drawbacks exist for these types of use cases, including:
    • Access to private data sources often must be negotiated or bought, “which potentially means substituting negotiations with corporations for those with national statistical offices;”
    • The meaning of such data is not always simple or stable, and local knowledge is needed to understand how people are using the technologies in question
    • Bias in proprietary data can be hard to understand and quantify;
    • Lack of privacy frameworks; and
    • Power asymmetries, wherein “LMIC citizens are unwittingly placed in a panopticon staffed by international researchers, with no way out and no legal recourse.”

van Panhuis, Willem G., Proma Paul, Claudia Emerson, John Grefenstette, Richard Wilder, Abraham J. Herbst, David Heymann, and Donald S. Burke. “A systematic review of barriers to data sharing in public health.” BMC public health 14, no. 1 (2014): 1144. Available from:

  • The authors of this report provide a “systematic literature of potential barriers to public health data sharing.” These twenty potential barriers are classified in six categories: “technical, motivational, economic, political, legal and ethical.” In this taxonomy, “the first three categories are deeply rooted in well-known challenges of health information systems for which structural solutions have yet to be found; the last three have solutions that lie in an international dialogue aimed at generating consensus on policies and instruments for data sharing.”
  • The authors suggest the need for a “systematic framework of barriers to data sharing in public health” in order to accelerate access and use of data for public good.

Verhulst, Stefaan and Sangokoya, David. “Mapping the Next Frontier of Open Data: Corporate Data Sharing.” In: Gasser, Urs and Zittrain, Jonathan and Faris, Robert and Heacock Jones, Rebekah, “Internet Monitor 2014: Reflections on the Digital World: Platforms, Policy, Privacy, and Public Discourse (December 15, 2014).” Berkman Center Research Publication No. 2014-17.

  • This essay describe a taxonomy of current corporate data sharing practices for public good: research partnerships; prizes and challenges; trusted intermediaries; application programming interfaces (APIs); intelligence products; and corporate data cooperatives or pooling.
  • Examples of data collaboratives include: Yelp Dataset Challenge, the Digital Ecologies Research Partnerhsip, BBVA Innova Challenge, Telecom Italia’s Big Data Challenge, NIH’s Accelerating Medicines Partnership and the White House’s Climate Data Partnerships.
  • The authors highlight important questions to consider towards a more comprehensive mapping of these activities.

Verhulst, Stefaan and Sangokoya, David, 2015. “Data Collaboratives: Exchanging Data to Improve People’s Lives.” Medium. Available from:

  • The essay refers to data collaboratives as a new form of collaboration involving participants from different sectors exchanging data to help solve public problems. These forms of collaborations can improve people’s lives through data-driven decision-making; information exchange and coordination; and shared standards and frameworks for multi-actor, multi-sector participation.
  • The essay cites four activities that are critical to accelerating data collaboratives: documenting value and measuring impact; matching public demand and corporate supply of data in a trusted way; training and convening data providers and users; experimenting and scaling existing initiatives.
  • Examples of data collaboratives include NIH’s Precision Medicine Initiative; the Mobile Data, Environmental Extremes and Population (MDEEP) Project; and Twitter-MIT’s Laboratory for Social Machines.

Verhulst, Stefaan, Susha, Iryna, Kostura, Alexander. “Data Collaboratives: matching Supply of (Corporate) Data to Solve Public Problems.” Medium. February 24, 2016.

  • This piece articulates a set of key lessons learned during a session at the International Data Responsibility Conference focused on identifying emerging practices, opportunities and challenges confronting data collaboratives.
  • The authors list a number of privately held data sources that could create positive public impacts if made more accessible in a collaborative manner, including:
    • Data for early warning systems to help mitigate the effects of natural disasters;
    • Data to help understand human behavior as it relates to nutrition and livelihoods in developing countries;
    • Data to monitor compliance with weapons treaties;
    • Data to more accurately measure progress related to the UN Sustainable Development Goals.
  • To the end of identifying and expanding on emerging practice in the space, the authors describe a number of current data collaborative experiments, including:
    • Trusted Intermediaries: Statistics Netherlands partnered with Vodafone to analyze mobile call data records in order to better understand mobility patterns and inform urban planning.
    • Prizes and Challenges: Orange Telecom, which has been a leader in this type of Data Collaboration, provided several examples of the company’s initiatives, such as the use of call data records to track the spread of malaria as well as their experience with Challenge 4 Development.
    • Research partnerships: The Data for Climate Action project is an ongoing large-scale initiative incentivizing companies to share their data to help researchers answer particular scientific questions related to climate change and adaptation.
    • Sharing intelligence products: JPMorgan Chase shares macro economic insights they gained leveraging their data through the newly established JPMorgan Chase Institute.
  • In order to capitalize on the opportunities provided by data collaboratives, a number of needs were identified:
    • A responsible data framework;
    • Increased insight into different business models that may facilitate the sharing of data;
    • Capacity to tap into the potential value of data;
    • Transparent stock of available data supply; and
    • Mapping emerging practices and models of sharing.

Vogel, N., Theisen, C., Leidig, J. P., Scripps, J., Graham, D. H., & Wolffe, G. “Mining mobile datasets to enable the fine-grained stochastic simulation of Ebola diffusion.” Paper presented at the Procedia Computer Science. 2015.

  • The paper presents a research study conducted on the basis of the mobile calls records shared with researchers in the framework of the Data for Development Challenge by the mobile operator Orange.
  • The study discusses the data analysis approach in relation to developing a situation of Ebola diffusion built around “the interactions of multi-scale models, including viral loads (at the cellular level), disease progression (at the individual person level), disease propagation (at the workplace and family level), societal changes in migration and travel movements (at the population level), and mitigating interventions (at the abstract government policy level).”
  • The authors argue that the use of their population, mobility, and simulation models provide more accurate simulation details in comparison to high-level analytical predictions and that the D4D mobile datasets provide high-resolution information useful for modeling developing regions and hard to reach locations.

Welle Donker, F., van Loenen, B., & Bregt, A. K. “Open Data and Beyond.” ISPRS International Journal of Geo-Information, 5(4). 2016.

  • This research has developed a monitoring framework to assess the effects of open (private) data using a case study of a Dutch energy network administrator Liander.
  • Focusing on the potential impacts of open private energy data – beyond ‘smart disclosure’ where citizens are given information only about their own energy usage – the authors identify three attainable strategic goals:
    • Continuously optimize performance on services, security of supply, and costs;
    • Improve management of energy flows and insight into energy consumption;
    • Help customers save energy and switch over to renewable energy sources.
  • The authors propose a seven-step framework for assessing the impacts of Liander data, in particular, and open private data more generally:
    • Develop a performance framework to describe what the program is about, description of the organization’s mission and strategic goals;
    • Identify the most important elements, or key performance areas which are most critical to understanding and assessing your program’s success;
    • Select the most appropriate performance measures;
    • Determine the gaps between what information you need and what is available;
    • Develop and implement a measurement strategy to address the gaps;
    • Develop a performance report which highlights what you have accomplished and what you have learned;
    • Learn from your experiences and refine your approach as required.
  • While the authors note that the true impacts of this open private data will likely not come into view in the short term, they argue that, “Liander has successfully demonstrated that private energy companies can release open data, and has successfully championed the other Dutch network administrators to follow suit.”

World Economic Forum, 2015. “Data-driven development: pathways for progress.” Geneva: World Economic Forum.

  • This report captures an overview of the existing data deficit and the value and impact of big data for sustainable development.
  • The authors of the report focus on four main priorities towards a sustainable data revolution: commercial incentives and trusted agreements with public- and private-sector actors; the development of shared policy frameworks, legal protections and impact assessments; capacity building activities at the institutional, community, local and individual level; and lastly, recognizing individuals as both produces and consumers of data.

Open Data Impact: When Demand and Supply Meet

Stefaan Verhulst and Andrew Young at the GovLab: “Today, in “Open Data Impact: When Demand and Supply Meet,” the GovLab and Omidyar Network release key findings about the social, economic, cultural and political impact of open data. The findings are based on 19 detailed case studies of open data projects from around the world. These case studies were prepared in order to address an important shortcoming in our understanding of when, and how, open data works. While there is no shortage of enthusiasm for open data’s potential, nor of conjectural estimates of its hypothetical impact, few rigorous, systematic analyses exist of its concrete, real-world impact…. The 19 case studies that inform this report, all of which can be found at Open Data’s Impact (, a website specially set up for this project, were chosen for their geographic and sectoral representativeness. They seek to go beyond the descriptive (what happened) to the explanatory (why it happened, and what is the wider relevance or impact)….

In order to achieve the potential of open data and scale the impact of the individual projects discussed in our report, we need a better – and more granular – understanding of the enabling conditions that lead to success. We found 4 central conditions (“4Ps”) that play an important role in ensuring success:


  • Partnerships: Intermediaries and data collaboratives play an important role in ensuring success, allowing for enhanced matching of supply and demand of data.
  • Public infrastructure: Developing open data as a public infrastructure, open to all, enables wider participation, and a broader impact across issues and sectors.
  • Policies: Clear policies regarding open data, including those promoting regular assessments of open data projects, are also critical for success.
  • Problem definition: Open data initiatives that have a clear target or problem definition have more impact and are more likely to succeed than those with vaguely worded statements of intent or unclear reasons for existence. 

Core Challenges

Finally, the success of a project is also determined by the obstacles and challenges it confronts. Our research uncovered 4 major challenges (“4Rs”) confronting open data initiatives across the globe:


  • Readiness: A lack of readiness or capacity (evident, for example, in low Internet penetration or technical literacy rates) can severely limit the impact of open data.
  • Responsiveness: Open data projects are significantly more likely to be successful when they remain agile and responsive—adapting, for instance, to user feedback or early indications of success and failure.
  • Risks: For all its potential, open data does pose certain risks, notably to privacy and security; a greater, more nuanced understanding of these risks will be necessary to address and mitigate them.
  • Resource Allocation: While open data projects can often be launched cheaply, those projects that receive generous, sustained and committed funding have a better chance of success over the medium and long term.

Toward a Next Generation Open Data Roadmap

The report we release today concludes with ten recommendations for policymakers, advocates, users, funders and other stakeholders in the open data community. For each step, we include a few concrete methods of implementation – ways to translate the broader recommendation into meaningful impact.

Together, these 10 recommendations and their means of implementation amount to what we call a “Next Generation Open Data Roadmap.” This roadmap is just a start, and we plan to continue fleshing it out in the near future. For now, it offers a way forward. It is our hope that this roadmap will help guide future research and experimentation so that we can continue to better understand how the potential of open data can be fulfilled across geographies, sectors and demographics.

Additional Resources

In conjunction with the release of our key findings paper, we also launch today an “Additional Resources” section on the Open Data’s Impact website. The goal of that section is to provide context on our case studies, and to point in the direction of other, complementary research. It includes the following elements:

  • A “repository of repositories,” including other compendiums of open data case studies and sources;
  • A compilation of some popular open data glossaries;
  • A number of open data research publications and reports, with a particular focus on impact;
  • A collection of open data definitions and a matrix of analysis to help assess those definitions….(More)

Data Collaboratives: Matching Demand with Supply of (Corporate) Data to solve Public Problems

Blog by Stefaan G. Verhulst, IrynaSusha and Alexander Kostura: “Data Collaboratives refer to a new form of collaboration, beyond the public-private partnership model, in which participants from different sectors (private companies, research institutions, and government agencies) share data to help solve public problems. Several of society’s greatest challenges — from climate change to poverty — require greater access to big (but not always open) data sets, more cross-sector collaboration, and increased capacity for data analysis. Participants at the workshop and breakout session explored the various ways in which data collaborative can help meet these needs.

Matching supply and demand of data emerged as one of the most important and overarching issues facing the big and open data communities. Participants agreed that more experimentation is needed so that new, innovative and more successful models of data sharing can be identified.

How to discover and enable such models? When asked how the international community might foster greater experimentation, participants indicated the need to develop the following:

· A responsible data framework that serves to build trust in sharing data would be based upon existing frameworks but also accommodates emerging technologies and practices. It would also need to be sensitive to public opinion and perception.

· Increased insight into different business models that may facilitate the sharing of data. As experimentation continues, the data community should map emerging practices and models of sharing so that successful cases can be replicated.

· Capacity to tap into the potential value of data. On the demand side,capacity refers to the ability to pose good questions, understand current data limitations, and seek new data sets responsibly. On the supply side, this means seeking shared value in collaboration, thinking creatively about public use of private data, and establishing norms of responsibility around security, privacy, and anonymity.

· Transparent stock of available data supply, including an inventory of what corporate data exist that can match multiple demands and that is shared through established networks and new collaborative institutional structures.

· Mapping emerging practices and models of sharing. Corporate data offers value not only for humanitarian action (which was a particular focus at the conference) but also for a variety of other domains, including science,agriculture, health care, urban development, environment, media and arts,and others. Gaining insight in the practices that emerge across sectors could broaden the spectrum of what is feasible and how.

In general, it was felt that understanding the business models underlying data collaboratives is of utmost importance in order to achieve win-win outcomes for both private and public sector players. Moreover, issues of public perception and trust were raised as important concerns of government organizations participating in data collaboratives….(More)”

6 lessons from sharing humanitarian data

Francis Irving at LLRX: “The Humanitarian Data Exchange (HDX) is an unusual data hub. It’s made by the UN, and is successfully used by agencies, NGOs, companies, Governments and academics to share data.

They’re doing this during crises such as the Ebola epidemic and the Nepal earthquakes, and every day to build up information in between crises.

There are lots of data hubs which are used by one organisation to publish data, far fewer which are used by lots of organisations to share data. The HDX project did a bunch of things right. What were they?

Here are six lessons…

1) Do good design

HDX started with user needs research. This was expensive, and was immediately worth it because it stopped a large part of the project which wasn’t needed.

The user needs led to design work which has made the website seem simple and beautiful – particularly unusual for something from a large bureaucracy like the UN.

HDX front page

2) Build on existing software

When making a hub for sharing data, there’s no need to make something from scratch. Open Knowledge’s CKANsoftware is open source, this stuff is a commodity. HDX has developers who modify and improve it for the specific needs of humanitarian data.


3) Use experts

HDX is a great international team – the leader is in New York, most of the developers are in Romania, there’s a data lab in Nairobi. Crucially, they bring in specific outside expertise: frog design do the user research and design work;ScraperWiki, experts in data collaboration, provide operational management.

ScraperWiki logo

4) Measure the right things

HDX’s metrics are about both sides of its two sided network. Are users who visit the site actually finding and downloading data they want? Are new organisations joining to share data? They’re avoiding “vanity metrics”, taking inspiration from tech startup concepts like “pirate metrics“.

HDX metrics

5) Add features specific to your community

There are endless features you can add to data hubs – most add no value, and end up a cost to maintain. HDX add specific things valuable to its community.

For example, much humanitarian data is in “shape files”, a standard for geographical information. HDX automatically renders a beautiful map of these – essential for users who don’t have ArcGIS, and a good check for those that do.

Syrian border crossing

6) Trust in the data

The early user research showed that trust in the data was vital. For this reason, anyone can’t just come along and add data to it. New organisations have to apply – proving either that they’re known in humanitarian circles, or have quality data to share. Applications are checked by hand. It’s important to get this kind of balance right – being too ideologically open or closed doesn’t work.

Apply HDX


The detail of how a data sharing project is run really matters….(More)”

Citizen-Generated Data and Governments: Towards a Collaborative Model

Civicus: “…we’re very happy today to launch “Citizen-Generated Data and Governments: Towards a Collaborative Model”.

This piece explores the idea that governments could host and publish citizen-generated data (CGD) themselves, and whether this could mean that data is applied more widely and in a more sustainable way. It was inspired by a recent meeting in Buenos Aires with Argentine civil society organizations and government representatives, hosted by the City of Buenos Aires Innovation and Open Government Lab (Laboratorio de innovación y Gobierno Abierto de la Ciudad de Buenos Aires).

Screen Shot 2015-10-26 at 20.58.06

The meeting was organized to explore how people within government think about citizen-generated data, and discuss what would be needed for them to consider it as a valid method of data generation. One of the most novel and exciting ideas that surfaced was the potential for government open data portals, such as that managed by the Buenos Aires Innovation Lab, to host and publish CGD.

We wrote this report to explore this issue further, looking at existing models of data collaboration and outlining our first thoughts on the benefits and obstacles this kind of model might face. We welcome feedback from those with deeper expertise into different aspects of citizen-generated data, and look forward to refining these thoughts in the future together with the broader community…(More)”