Selected Readings on Data Collaboratives


By Neil Britto, David Sangokoya, Iryna Susha, Stefaan Verhulst and Andrew Young

The Living Library’s Selected Readings series seeks to build a knowledge base on innovative approaches for improving the effectiveness and legitimacy of governance. This curated and annotated collection of recommended works on the topic of data collaboratives was originally published in 2017.

The term data collaborative refers to a new form of collaboration, beyond the public-private partnership model, in which participants from different sectors (including private companies, research institutions, and government agencies ) can exchange data to help solve public problems. Several of society’s greatest challenges — from addressing climate change to public health to job creation to improving the lives of children — require greater access to data, more collaboration between public – and private-sector entities, and an increased ability to analyze datasets. In the coming months and years, data collaboratives will be essential vehicles for harnessing the vast stores of privately held data toward the public good.

Selected Reading List (in alphabetical order)

Annotated Selected Readings List (in alphabetical order)

Agaba, G., Akindès, F., Bengtsson, L., Cowls, J., Ganesh, M., Hoffman, N., . . . Meissner, F. “Big Data and Positive Social Change in the Developing World: A White Paper for Practitioners and Researchers.” 2014. http://bit.ly/25RRC6N.

  • This white paper, produced by “a group of activists, researchers and data experts” explores the potential of big data to improve development outcomes and spur positive social change in low- and middle-income countries. Using examples, the authors discuss four areas in which the use of big data can impact development efforts:
    • Advocating and facilitating by “opening[ing] up new public spaces for discussion and awareness building;
    • Describing and predicting through the detection of “new correlations and the surfac[ing] of new questions;
    • Facilitating information exchange through “multiple feedback loops which feed into both research and action,” and
    • Promoting accountability and transparency, especially as a byproduct of crowdsourcing efforts aimed at “aggregat[ing] and analyz[ing] information in real time.
  • The authors argue that in order to maximize the potential of big data’s use in development, “there is a case to be made for building a data commons for private/public data, and for setting up new and more appropriate ethical guidelines.”
  • They also identify a number of challenges, especially when leveraging data made accessible from a number of sources, including private sector entities, such as:
    • Lack of general data literacy;
    • Lack of open learning environments and repositories;
    • Lack of resources, capacity and access;
    • Challenges of sensitivity and risk perception with regard to using data;
    • Storage and computing capacity; and
    • Externally validating data sources for comparison and verification.

Ansell, C. and Gash, A. “Collaborative Governance in Theory and Practice.” Journal of Public Administration Research and  Theory 18 (4), 2008. http://bit.ly/1RZgsI5.

  • This article describes collaborative arrangements that include public and private organizations working together and proposes a model for understanding an emergent form of public-private interaction informed by 137 diverse cases of collaborative governance.
  • The article suggests factors significant to successful partnering processes and outcomes include:
    • Shared understanding of challenges,
    • Trust building processes,
    • The importance of recognizing seemingly modest progress, and
    • Strong indicators of commitment to the partnership’s aspirations and process.
  • The authors provide a ‘’contingency theory model’’ that specifies relationships between different variables that influence outcomes of collaborative governance initiatives. Three “core contingencies’’ for successful collaborative governance initiatives identified by the authors are:
    • Time (e.g., decision making time afforded to the collaboration);
    • Interdependence (e.g., a high degree of interdependence can mitigate negative effects of low trust); and
    • Trust (e.g. a higher level of trust indicates a higher probability of success).

Ballivian A, Hoffman W. “Public-Private Partnerships for Data: Issues Paper for Data Revolution Consultation.” World Bank, 2015. Available from: http://bit.ly/1ENvmRJ

  • This World Bank report provides a background document on forming public-prviate partnerships for data with the private sector in order to inform the UN’s Independent Expert Advisory Group (IEAG) on sustaining a “data revolution” in sustainable development.
  • The report highlights the critical position of private companies within the data value chain and reflects on key elements of a sustainable data PPP: “common objectives across all impacted stakeholders, alignment of incentives, and sharing of risks.” In addition, the report describes the risks and incentives of public and private actors, and the principles needed to “build[ing] the legal, cultural, technological and economic infrastructures to enable the balancing of competing interests.” These principles include understanding; experimentation; adaptability; balance; persuasion and compulsion; risk management; and governance.
  • Examples of data collaboratives cited in the report include HP Earth Insights, Orange Data for Development Challenges, Amazon Web Services, IBM Smart Cities Initiative, and the Governance Lab’s Open Data 500.

Brack, Matthew, and Tito Castillo. “Data Sharing for Public Health: Key Lessons from Other Sectors.” Chatham House, Centre on Global Health Security. April 2015. Available from: http://bit.ly/1DHFGVl

  • The Chatham House report provides an overview on public health surveillance data sharing, highlighting the benefits and challenges of shared health data and the complexity in adapting technical solutions from other sectors for public health.
  • The report describes data sharing processes from several perspectives, including in-depth case studies of actual data sharing in practice at the individual, organizational and sector levels. Among the key lessons for public health data sharing, the report strongly highlights the need to harness momentum for action and maintain collaborative engagement: “Successful data sharing communities are highly collaborative. Collaboration holds the key to producing and abiding by community standards, and building and maintaining productive networks, and is by definition the essence of data sharing itself. Time should be invested in establishing and sustaining collaboration with all stakeholders concerned with public health surveillance data sharing.”
  • Examples of data collaboratives include H3Africa (a collaboration between NIH and Wellcome Trust) and NHS England’s care.data programme.

de Montjoye, Yves-Alexandre, Jake Kendall, and Cameron F. Kerry. “Enabling Humanitarian Use of Mobile Phone Data.” The Brookings Institution, Issues in Technology Innovation. November 2014. Available from: http://brook.gs/1JxVpxp

  • Using Ebola as a case study, the authors describe the value of using private telecom data for uncovering “valuable insights into understanding the spread of infectious diseases as well as strategies into micro-target outreach and driving update of health-seeking behavior.”
  • The authors highlight the absence of a common legal and standards framework for “sharing mobile phone data in privacy-conscientious ways” and recommend “engaging companies, NGOs, researchers, privacy experts, and governments to agree on a set of best practices for new privacy-conscientious metadata sharing models.”

Eckartz, Silja M., Hofman, Wout J., Van Veenstra, Anne Fleur. “A decision model for data sharing.” Vol. 8653 LNCS. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). 2014. http://bit.ly/21cGWfw.

  • This paper proposes a decision model for data sharing of public and private data based on literature review and three case studies in the logistics sector.
  • The authors identify five categories of the barriers to data sharing and offer a decision model for identifying potential interventions to overcome each barrier:
    • Ownership. Possible interventions likely require improving trust among those who own the data through, for example, involvement and support from higher management
    • Privacy. Interventions include “anonymization by filtering of sensitive information and aggregation of data,” and access control mechanisms built around identity management and regulated access.  
    • Economic. Interventions include a model where data is shared only with a few trusted organizations, and yield management mechanisms to ensure negative financial consequences are avoided.
    • Data quality. Interventions include identifying additional data sources that could improve the completeness of datasets, and efforts to improve metadata.
    • Technical. Interventions include making data available in structured formats and publishing data according to widely agreed upon data standards.

Hoffman, Sharona and Podgurski, Andy. “The Use and Misuse of Biomedical Data: Is Bigger Really Better?” American Journal of Law & Medicine 497, 2013. http://bit.ly/1syMS7J.

  • This journal articles explores the benefits and, in particular, the risks related to large-scale biomedical databases bringing together health information from a diversity of sources across sectors. Some data collaboratives examined in the piece include:
    • MedMining – a company that extracts EHR data, de-identifies it, and offers it to researchers. The data sets that MedMining delivers to its customers include ‘lab results, vital signs, medications, procedures, diagnoses, lifestyle data, and detailed costs’ from inpatient and outpatient facilities.
    • Explorys has formed a large healthcare database derived from financial, administrative, and medical records. It has partnered with major healthcare organizations such as the Cleveland Clinic Foundation and Summa Health System to aggregate and standardize health information from ten million patients and over thirty billion clinical events.
  • Hoffman and Podgurski note that biomedical databases populated have many potential uses, with those likely to benefit including: “researchers, regulators, public health officials, commercial entities, lawyers,” as well as “healthcare providers who conduct quality assessment and improvement activities,” regulatory monitoring entities like the FDA, and “litigants in tort cases to develop evidence concerning causation and harm.”
  • They argue, however, that risks arise based on:
    • The data contained in biomedical databases is surprisingly likely to be incorrect or incomplete;
    • Systemic biases, arising from both the nature of the data and the preconceptions of investigators are serious threats the validity of research results, especially in answering causal questions;
  • Data mining of biomedical databases makes it easier for individuals with political, social, or economic agendas to generate ostensibly scientific but misleading research findings for the purpose of manipulating public opinion and swaying policymakers.

Krumholz, Harlan M., et al. “Sea Change in Open Science and Data Sharing Leadership by Industry.” Circulation: Cardiovascular Quality and Outcomes 7.4. 2014. 499-504. http://1.usa.gov/1J6q7KJ

  • This article provides a comprehensive overview of industry-led efforts and cross-sector collaborations in data sharing by pharmaceutical companies to inform clinical practice.
  • The article details the types of data being shared and the early activities of GlaxoSmithKline (“in coordination with other companies such as Roche and ViiV”); Medtronic and the Yale University Open Data Access Project; and Janssen Pharmaceuticals (Johnson & Johnson). The article also describes the range of involvement in data sharing among pharmaceutical companies including Pfizer, Novartis, Bayer, AbbVie, Eli Llly, AstraZeneca, and Bristol-Myers Squibb.

Mann, Gideon. “Private Data and the Public Good.” Medium. May 17, 2016. http://bit.ly/1OgOY68.

    • This Medium post from Gideon Mann, the Head of Data Science at Bloomberg, shares his prepared remarks given at a lecture at the City College of New York. Mann argues for the potential benefits of increasing access to private sector data, both to improve research and academic inquiry and also to help solve practical, real-world problems. He also describes a number of initiatives underway at Bloomberg along these lines.    
  • Mann argues that data generated at private companies “could enable amazing discoveries and research,” but is often inaccessible to those who could put it to those uses. Beyond research, he notes that corporate data could, for instance, benefit:
      • Public health – including suicide prevention, addiction counseling and mental health monitoring.
    • Legal and ethical questions – especially as they relate to “the role algorithms have in decisions about our lives,” such as credit checks and resume screening.
  • Mann recognizes the privacy challenges inherent in private sector data sharing, but argues that it is a common misconception that the only two choices are “complete privacy or complete disclosure.” He believes that flexible frameworks for differential privacy could open up new opportunities for responsibly leveraging data collaboratives.

Pastor Escuredo, D., Morales-Guzmán, A. et al, “Flooding through the Lens of Mobile Phone Activity.” IEEE Global Humanitarian Technology Conference, GHTC 2014. Available from: http://bit.ly/1OzK2bK

  • This report describes the impact of using mobile data in order to understand the impact of disasters and improve disaster management. The report was conducted in the Mexican state of Tabasco in 2009 as a multidisciplinary, multi-stakeholder consortium involving the UN World Food Programme (WFP), Telefonica Research, Technical University of Madrid (UPM), Digital Strategy Coordination Office of the President of Mexico, and UN Global Pulse.
  • Telefonica Research, a division of the major Latin American telecommunications company, provided call detail records covering flood-affected areas for nine months. This data was combined with “remote sensing data (satellite images), rainfall data, census and civil protection data.” The results of the data demonstrated that “analysing mobile activity during floods could be used to potentially locate damaged areas, efficiently assess needs and allocate resources (for example, sending supplies to affected areas).”
  • In addition to the results, the study highlighted “the value of a public-private partnership on using mobile data to accurately indicate flooding impacts in Tabasco, thus improving early warning and crisis management.”

* Perkmann, M. and Schildt, H. “Open data partnerships between firms and universities: The role of boundary organizations.” Research Policy, 44(5), 2015. http://bit.ly/25RRJ2c

  • This paper discusses the concept of a “boundary organization” in relation to industry-academic partnerships driven by data. Boundary organizations perform mediated revealing, allowing firms to disclose their research problems to a broad audience of innovators and simultaneously minimize the risk that this information would be adversely used by competitors.
  • The authors identify two especially important challenges for private firms to enter open data or participate in data collaboratives with the academic research community that could be addressed through more involvement from boundary organizations:
    • First is a challenge of maintaining competitive advantage. The authors note that, “the more a firm attempts to align the efforts in an open data research programme with its R&D priorities, the more it will have to reveal about the problems it is addressing within its proprietary R&D.”
    • Second, involves the misalignment of incentives between the private and academic field. Perkmann and Schildt argue that, a firm seeking to build collaborations around its opened data “will have to provide suitable incentives that are aligned with academic scientists’ desire to be rewarded for their work within their respective communities.”

Robin, N., Klein, T., & Jütting, J. “Public-Private Partnerships for Statistics: Lessons Learned, Future Steps.” OECD. 2016. http://bit.ly/24FLYlD.

  • This working paper acknowledges the growing body of work on how different types of data (e.g, telecom data, social media, sensors and geospatial data, etc.) can address data gaps relevant to National Statistical Offices (NSOs).
  • Four models of public-private interaction for statistics are describe: in-house production of statistics by a data-provider for a national statistics office (NSO), transfer of data-sets to NSOs from private entities, transfer of data to a third party provider to manage the NSO and private entity data, and the outsourcing of NSO functions.
  • The paper highlights challenges to public-private partnerships involving data (e.g., technical challenges, data confidentiality, risks, limited incentives for participation), suggests deliberate and highly structured approaches to public-private partnerships involving data require enforceable contracts, emphasizes the trade-off between data specificity and accessibility of such data, and the importance of pricing mechanisms that reflect the capacity and capability of national statistic offices.
  • Case studies referenced in the paper include:
    • A mobile network operator’s (MNO Telefonica) in house analysis of call detail records;
    • A third-party data provider and steward of travel statistics (Positium);
    • The Data for Development (D4D) challenge organized by MNO Orange; and
    • Statistics Netherlands use of social media to predict consumer confidence.

Stuart, Elizabeth, Samman, Emma, Avis, William, Berliner, Tom. “The data revolution: finding the missing millions.” Overseas Development Institute, 2015. Available from: http://bit.ly/1bPKOjw

  • The authors of this report highlight the need for good quality, relevant, accessible and timely data for governments to extend services into underrepresented communities and implement policies towards a sustainable “data revolution.”
  • The solutions focused on this recent report from the Overseas Development Institute focus on capacity-building activities of national statistical offices (NSOs), alternative sources of data (including shared corporate data) to address gaps, and building strong data management systems.

Taylor, L., & Schroeder, R. “Is bigger better? The emergence of big data as a tool for international development policy.” GeoJournal, 80(4). 2015. 503-518. http://bit.ly/1RZgSy4.

  • This journal article describes how privately held data – namely “digital traces” of consumer activity – “are becoming seen by policymakers and researchers as a potential solution to the lack of reliable statistical data on lower-income countries.
  • They focus especially on three categories of data collaborative use cases:
    • Mobile data as a predictive tool for issues such as human mobility and economic activity;
    • Use of mobile data to inform humanitarian response to crises; and
    • Use of born-digital web data as a tool for predicting economic trends, and the implications these have for LMICs.
  • They note, however, that a number of challenges and drawbacks exist for these types of use cases, including:
    • Access to private data sources often must be negotiated or bought, “which potentially means substituting negotiations with corporations for those with national statistical offices;”
    • The meaning of such data is not always simple or stable, and local knowledge is needed to understand how people are using the technologies in question
    • Bias in proprietary data can be hard to understand and quantify;
    • Lack of privacy frameworks; and
    • Power asymmetries, wherein “LMIC citizens are unwittingly placed in a panopticon staffed by international researchers, with no way out and no legal recourse.”

van Panhuis, Willem G., Proma Paul, Claudia Emerson, John Grefenstette, Richard Wilder, Abraham J. Herbst, David Heymann, and Donald S. Burke. “A systematic review of barriers to data sharing in public health.” BMC public health 14, no. 1 (2014): 1144. Available from: http://bit.ly/1JOBruO

  • The authors of this report provide a “systematic literature of potential barriers to public health data sharing.” These twenty potential barriers are classified in six categories: “technical, motivational, economic, political, legal and ethical.” In this taxonomy, “the first three categories are deeply rooted in well-known challenges of health information systems for which structural solutions have yet to be found; the last three have solutions that lie in an international dialogue aimed at generating consensus on policies and instruments for data sharing.”
  • The authors suggest the need for a “systematic framework of barriers to data sharing in public health” in order to accelerate access and use of data for public good.

Verhulst, Stefaan and Sangokoya, David. “Mapping the Next Frontier of Open Data: Corporate Data Sharing.” In: Gasser, Urs and Zittrain, Jonathan and Faris, Robert and Heacock Jones, Rebekah, “Internet Monitor 2014: Reflections on the Digital World: Platforms, Policy, Privacy, and Public Discourse (December 15, 2014).” Berkman Center Research Publication No. 2014-17. http://bit.ly/1GC12a2

  • This essay describe a taxonomy of current corporate data sharing practices for public good: research partnerships; prizes and challenges; trusted intermediaries; application programming interfaces (APIs); intelligence products; and corporate data cooperatives or pooling.
  • Examples of data collaboratives include: Yelp Dataset Challenge, the Digital Ecologies Research Partnerhsip, BBVA Innova Challenge, Telecom Italia’s Big Data Challenge, NIH’s Accelerating Medicines Partnership and the White House’s Climate Data Partnerships.
  • The authors highlight important questions to consider towards a more comprehensive mapping of these activities.

Verhulst, Stefaan and Sangokoya, David, 2015. “Data Collaboratives: Exchanging Data to Improve People’s Lives.” Medium. Available from: http://bit.ly/1JOBDdy

  • The essay refers to data collaboratives as a new form of collaboration involving participants from different sectors exchanging data to help solve public problems. These forms of collaborations can improve people’s lives through data-driven decision-making; information exchange and coordination; and shared standards and frameworks for multi-actor, multi-sector participation.
  • The essay cites four activities that are critical to accelerating data collaboratives: documenting value and measuring impact; matching public demand and corporate supply of data in a trusted way; training and convening data providers and users; experimenting and scaling existing initiatives.
  • Examples of data collaboratives include NIH’s Precision Medicine Initiative; the Mobile Data, Environmental Extremes and Population (MDEEP) Project; and Twitter-MIT’s Laboratory for Social Machines.

Verhulst, Stefaan, Susha, Iryna, Kostura, Alexander. “Data Collaboratives: matching Supply of (Corporate) Data to Solve Public Problems.” Medium. February 24, 2016. http://bit.ly/1ZEp2Sr.

  • This piece articulates a set of key lessons learned during a session at the International Data Responsibility Conference focused on identifying emerging practices, opportunities and challenges confronting data collaboratives.
  • The authors list a number of privately held data sources that could create positive public impacts if made more accessible in a collaborative manner, including:
    • Data for early warning systems to help mitigate the effects of natural disasters;
    • Data to help understand human behavior as it relates to nutrition and livelihoods in developing countries;
    • Data to monitor compliance with weapons treaties;
    • Data to more accurately measure progress related to the UN Sustainable Development Goals.
  • To the end of identifying and expanding on emerging practice in the space, the authors describe a number of current data collaborative experiments, including:
    • Trusted Intermediaries: Statistics Netherlands partnered with Vodafone to analyze mobile call data records in order to better understand mobility patterns and inform urban planning.
    • Prizes and Challenges: Orange Telecom, which has been a leader in this type of Data Collaboration, provided several examples of the company’s initiatives, such as the use of call data records to track the spread of malaria as well as their experience with Challenge 4 Development.
    • Research partnerships: The Data for Climate Action project is an ongoing large-scale initiative incentivizing companies to share their data to help researchers answer particular scientific questions related to climate change and adaptation.
    • Sharing intelligence products: JPMorgan Chase shares macro economic insights they gained leveraging their data through the newly established JPMorgan Chase Institute.
  • In order to capitalize on the opportunities provided by data collaboratives, a number of needs were identified:
    • A responsible data framework;
    • Increased insight into different business models that may facilitate the sharing of data;
    • Capacity to tap into the potential value of data;
    • Transparent stock of available data supply; and
    • Mapping emerging practices and models of sharing.

Vogel, N., Theisen, C., Leidig, J. P., Scripps, J., Graham, D. H., & Wolffe, G. “Mining mobile datasets to enable the fine-grained stochastic simulation of Ebola diffusion.” Paper presented at the Procedia Computer Science. 2015. http://bit.ly/1TZDroF.

  • The paper presents a research study conducted on the basis of the mobile calls records shared with researchers in the framework of the Data for Development Challenge by the mobile operator Orange.
  • The study discusses the data analysis approach in relation to developing a situation of Ebola diffusion built around “the interactions of multi-scale models, including viral loads (at the cellular level), disease progression (at the individual person level), disease propagation (at the workplace and family level), societal changes in migration and travel movements (at the population level), and mitigating interventions (at the abstract government policy level).”
  • The authors argue that the use of their population, mobility, and simulation models provide more accurate simulation details in comparison to high-level analytical predictions and that the D4D mobile datasets provide high-resolution information useful for modeling developing regions and hard to reach locations.

Welle Donker, F., van Loenen, B., & Bregt, A. K. “Open Data and Beyond.” ISPRS International Journal of Geo-Information, 5(4). 2016. http://bit.ly/22YtugY.

  • This research has developed a monitoring framework to assess the effects of open (private) data using a case study of a Dutch energy network administrator Liander.
  • Focusing on the potential impacts of open private energy data – beyond ‘smart disclosure’ where citizens are given information only about their own energy usage – the authors identify three attainable strategic goals:
    • Continuously optimize performance on services, security of supply, and costs;
    • Improve management of energy flows and insight into energy consumption;
    • Help customers save energy and switch over to renewable energy sources.
  • The authors propose a seven-step framework for assessing the impacts of Liander data, in particular, and open private data more generally:
    • Develop a performance framework to describe what the program is about, description of the organization’s mission and strategic goals;
    • Identify the most important elements, or key performance areas which are most critical to understanding and assessing your program’s success;
    • Select the most appropriate performance measures;
    • Determine the gaps between what information you need and what is available;
    • Develop and implement a measurement strategy to address the gaps;
    • Develop a performance report which highlights what you have accomplished and what you have learned;
    • Learn from your experiences and refine your approach as required.
  • While the authors note that the true impacts of this open private data will likely not come into view in the short term, they argue that, “Liander has successfully demonstrated that private energy companies can release open data, and has successfully championed the other Dutch network administrators to follow suit.”

World Economic Forum, 2015. “Data-driven development: pathways for progress.” Geneva: World Economic Forum. http://bit.ly/1JOBS8u

  • This report captures an overview of the existing data deficit and the value and impact of big data for sustainable development.
  • The authors of the report focus on four main priorities towards a sustainable data revolution: commercial incentives and trusted agreements with public- and private-sector actors; the development of shared policy frameworks, legal protections and impact assessments; capacity building activities at the institutional, community, local and individual level; and lastly, recognizing individuals as both produces and consumers of data.

Open Data For Social Good: The Case For Better Transport Services


 at TechWeek Europe: “The growing focus on data protection, driven partly by stronger legislation and partly by consumer pressure, has put the debate on the benefits of open data somewhat on the back burner.

The continuing spate of high-profile data breaches and the abuse of public trust in the form of constant bombardment of automated calls, spam emails and clumsily ‘personalised’ advertising has done little to further the open data agenda. In fact it left many consumers feeling lukewarm about the prospects of organisations opening up their data feeds, even at a promise of a better service in return.

That’s a worrying trend. In many industries effective use of open data can lead to development of solutions that address some of the major challenges populations are faced with today, allowing for faster innovation and adaptability to change. There are significant ways in which individuals, and society as a whole could benefit from open data, if organisations and governments get data sharing right.

Open data for transport

A good example is city transportation. Many metropolises face a major challenge – growing populations are placing pressure on current infrastructure systems, leading to congestion and inefficiency.

An open data system, where commuters use a single travel account for all travel transactions and information – whether that’s public transport, walking, using the bike, using Uber, and so on, would give the city unprecedented insight into how people commute and what’s behind their travel choices.

The key to engaging the public with this is the condition that data is used responsibly and for the greater good. Currently, Transport for London (TfL) operates a meet-in-the-middle model. Consumers can travel anonymously on the TfL network, with only the point of entry and point of exit being recorded, and the company provides that anonymised data to third-party app developers who can then use it to release useful travel applications.

TfL doesn’t profit from sharing consumer data but it does enjoy the benefits that come with it. Third-party travel applications make it easier for commuters to use TfL’s network and make the service itself appear more efficient – in short, everyone benefits.

Mutual benefit

Let’s now imagine a scenario that takes this mutually beneficial relationship a step forward, with consumers willingly giving up some information about themselves to the responsible parties (in this case, the city) and receiving personalised service in return. In this scenario, the more information commuters can provide to the system, the more useful the system can be to them.

Apart from providing personalised travel information and recommendations, such a system would have one more important benefit – it would enable cities to encourage greater social responsibility, extending the benefits from the individual to the community as a whole….(More)”

Big Data Quality: a Roadmap for Open Data


Paper by Paolo Ciancarini, Francesco Poggi and Daniel Russo: “Open Data (OD) is one of the most discussed issue of Big Data which raised the joint interest of public institutions, citizens and private companies since 2009. In addition to transparency in public administrations, another key objective of these initiatives is to allow the development of innovative services for solving real world problems, creating value in some positive and constructive way. However, the massive amount of freely available data has not yet brought the expected effects: as of today, there is no application that has exploited the potential provided by large and distributed information sources in a non-trivial way, nor any service has substantially changed for the better the lives of people. The era of a new generation applications based on open data is far to come. In this context, we observe that OD quality is one of the major threats to achieving the goals of the OD movement. The starting point of this study is the quality of the OD released by the five Constitutional offices of Italy. W3C standards about OD are widely known accepted in Italy by the Italian Digital Agency (AgID). According to the most recent Italian Laws the Public Administration may release OD according to the AgID standards. Our exploratory study aims to assess the quality of such releases and the real implementations of OD. The outcome suggests the need of a drastic improvement in OD quality. Finally we highlight some key quality principles for OD, and propose a roadmap for further research….(more)”

Soon Your City Will Know Everything About You


Currently, the biggest users of these sensor arrays are in cities, where city governments use them to collect large amounts of policy-relevant data. In Los Angeles, the crowdsourced traffic and navigation app Waze collects data that helps residents navigate the city’s choked highway networks. In Chicago, an ambitious program makes public data available to startups eager to build apps for residents. The city’s 49th ward has been experimenting with participatory budgeting and online votingto take the pulse of the community on policy issues. Chicago has also been developing the “Array of Things,” a network of sensors that track, among other things, the urban conditions that affect bronchitis.

Edmonton uses the cloud to track the condition of playground equipment. And a growing number of countries have purpose-built smart cities, like South Korea’s high tech utopia city of Songdo, where pervasive sensor networks and ubiquitous computing generate immense amounts of civic data for public services.

The drive for smart cities isn’t restricted to the developed world. Rio de Janeiro coordinates the information flows of 30 different city agencies. In Beijing and Da Nang (Vietnam), mobile phone data is actively tracked in the name of real-time traffic management. Urban sensor networks, in other words, are also developing in countries with few legal protections governing the usage of data.

These services are promising and useful. But you don’t have to look far to see why the Internet of Things has serious privacy implications. Public data is used for “predictive policing” in at least 75 cities across the U.S., including New York City, where critics maintain that using social media or traffic data to help officers evaluate probable cause is a form of digital stop-and-frisk. In Los Angeles, the security firm Palantir scoops up publicly generated data on car movements, merges it with license plate information collected by the city’s traffic cameras, and sells analytics back to the city so that police officers can decide whether or not to search a car. In Chicago, concern is growing about discriminatory profiling because so much information is collected and managed by the police department — an agency with a poor reputation for handling data in consistent and sensitive ways. In 2015, video surveillance of the police shooting Laquan McDonald outside a Burger King was erased by a police employee who ironically did not know his activities were being digitally recorded by cameras inside the restaurant.

Since most national governments have bungled privacy policy, cities — which have a reputation for being better with administrative innovations — will need to fill this gap. A few countries, such as Canada and the U.K., have independent “privacy commissioners” who are responsible for advocating for the public when bureaucracies must decide how to use or give out data. It is pretty clear that cities need such advocates too.

What would Urban Privacy Commissioners do? They would teach the public — and other government staff — about how policy algorithms work. They would evaluate the political context in which city agencies make big data investments. They would help a city negotiate contracts that protect residents’ privacy while providing effective analysis to policy makers and ensuring that open data is consistently serving the public good….(more)”.

While governments talk about smart cities, it’s citizens who create them


Carlo Ratti at the Conversation: “The Australian government recently released an ambitious Smart Cities Plan, which suggests that cities should be first and foremost for people:

If our cities are to continue to meet their residents’ needs, it is essential for people to engage and participate in planning and policy decisions that have an impact on their lives.

Such statements are a good starting point – and should probably become central to Australia’s implementation efforts. A lot of knowledge has been collected over the past decade from successful and failed smart cities experiments all over the world; reflecting on them could provide useful information for the Australian government as it launches its national plan.

What is a smart city?

But, before embarking on such review, it would help to start from a definition of “smart city”.

The term has been used and abused in recent years, so much so that today it has lost meaning. It is often used to encompass disparate applications: we hear people talk and write about “smart city” when they refer to anything from citizen engagement to Zipcar, from open data to Airbnb, from smart biking to broadband.

Where to start with a definition? It is a truism to say the internet has transformed our lives over the past 20 years. Everything in the way we work, meet, mate and so on is very different today than it was just a few decades ago, thanks to a network of connectivity that now encompasses most people on the planet.

In a similar way, we are today at the beginning of a new technological revolution: the internet is entering physical space – the very space of our cities – and is becoming the Internet of Things; it is opening the door to a new world of applications that, as with the first wave of the internet, can incorporate many domains….

What should governments do?

In the above technological context, what should governments do? Over the past few years, the first wave of smart city applications followed technological excitement.

For instance, some of Korea’s early experiments such as Songdo City were engineered by the likes of Cisco, with technology deployment assisted by top-down policy directives.

In a similar way, in 2010, Rio de Janeiro launched the Integrated Centre of Command and Control, engineered by IBM. It’s a large control room for the city, which collects real-time information from cameras and myriad sensors suffused in the urban fabric.

Such approaches revealed many shortcomings, most notably the lack of civic engagement. It is as if they thought of the city simply as a “computer in open air”. These approaches led to several backlashes in the research and academic community.

A more interesting lesson can come from the US, where the focus is more on developing a rich Internet of Things innovation ecosystem. There are many initiatives fostering spaces – digital and physical – for people to come together and collaborate on urban and civic innovations….

That isn’t to say that governments should take a completely hands-off approach to urban development. Governments certainly have an important role to play. This includes supporting academic research and promoting applications in fields that might be less appealing to venture capital – unglamorous but nonetheless crucial domains such as municipal waste or water services.

The public sector can also promote the use of open platforms and standards in such projects, which would speed up adoption in cities worldwide.

Still, the overarching goal should always be to focus on citizens. They are in the best position to determine how to transform their cities and to make decisions that will have – as the Australian Smart Cities Plan puts it – “an impact on their lives”….(more)”

Foundation Transparency: Game Over?


Brad Smith at Glass Pockets (Foundation Center): “The tranquil world of America’s foundations is about to be shaken, but if you read the Center for Effective Philanthropy’s (CEP) new study — Sharing What Matters, Foundation Transparency — you would never know it.

Don’t get me wrong. That study, like everything CEP produces, is carefully researched, insightful and thoroughly professional. But it misses the single biggest change in foundation transparency in decades: the imminent release by the Internal Revenue Service of foundation 990-PF (and 990) tax returns as machine-readable open data.

Clara Miller, President of the Heron Foundation, writes eloquently in her manifesto, Building a Foundation for the 21St Century: “…the private foundation model was designed to be protective and separate, much like a terrarium.”

Terrariums, of course, are highly “curated” environments over which their creators have complete control. The CEP study, proves that point, to the extent that much of the study consists of interviews with foundation leaders and reviews of their websites as if transparency were a kind of optional endeavor in which foundations may choose to participate, if at all, and to what degree.

To be fair, CEP also interviewed the grantees of various foundations (sometimes referred to as “partners”), which helps convey the reality that foundations have stakeholders beyond their four walls. However, the terrarium metaphor is about to become far more relevant as the release of 990 tax returns as open data will literally make it possible for anyone to look right through those glass walls to the curated foundation world within.

What Is Open Data?

It is safe to say that most foundation leaders and a fair majority of their staff do not understand what open data really is. Open data is free, yes, but more importantly it is digital and machine-readable. This means it can be consumed in enormous volumes at lightning speed, directly by computers.

Once consumed, open data can be tagged, sorted, indexed and searched using statistical methods to make obvious comparisons while discovering previously undetected correlations. Anyone with a computer, some coding skills and a hard drive or cloud storage can access open data. In today’s world, a lot of people meet those requirements, and they are free to do whatever they please with your information once it is, as open data enthusiasts like to say, “in the wild.”

What is the Internal Revenue Service Releasing?

Thanks to the Aspen Institute’s leadership of a joint effort – funded by foundations and including Foundation Center, GuideStar, the National Center for Charitable Statistics, the Johns Hopkins Center for Civil Society Studies, and others – the IRS has started to make some 1,000,000 Form 990s and 40,000 Form 990PF available as machine-readable open data.

Previously, all Form 990s had been released as image (TIFF) files, essentially a picture, making it both time-consuming and expensive to extract useful data from them. Credit where credit is due; a kick in the butt in the form of a lawsuit from open data crusader Carl Malamud helped speed the process along.

The current test phase includes only those tax returns that were digitally filed by nonprofits and community foundations (990s) and private foundations (990PFs). Over time, the IRS will phase in a mandatory digital filing requirement for all Form 990s, and the intent is to release them all as open data. In other words, that which is born digital will be opened up to the public in digital form. Because of variations in the 990 forms, getting the information from them into a database will still require some technical expertise, but will be far more feasible and faster than ever before.

The Good

The work of organizations like Foundation Center– who have built expensive infrastructure in order to turn years of 990 tax returns into information that can be used by nonprofits looking for funding, researchers trying to understand the role of foundations and foundations, themselves, seeking to benchmark themselves against peers—will be transformed.

Work will shift away from the mechanics of capturing and processing the data to higher level analysis and visualization to stimulate the generation and sharing of new insights and knowledge. This will fuel greater collaboration between peer organizations, innovation, the merging of previous disparate bodies of data, better philanthropy, and a stronger social sector… (more)

 

How Open Data Is Creating New Opportunities in the Public Sector


Martin Yan at GovTech: Increased availability of open data in turn increases the ease with which citizens and their governments can collaborate, as well as equipping citizens to be active in identifying and addressing issues themselves. Technology developers are able to explore innovative uses of open data in combination with digital tools, new apps or other products that can tackle recognized inefficiencies. Currently, both the public and private sectors are teeming with such apps and projects….

Open data has proven to be a catalyst for the creation of new tools across industries and public-sector uses. Examples of a few successful projects include:

  • Citymapper — The popular real-time public transport app uses open data from Apple, Google, Cyclestreets, OpenStreetMaps and more sources to help citizens navigate cities. Features include A-to-B trip planning with ETA, real-time departures, bike routing, transit maps, public transport line status, real-time disruption alerts and integration with Uber.
  • Dataverse Project — This project from Harvard’s Institute for Quantitative Social Science makes it easy to share, explore and analyze research data. By simplifying access to this data, the project allows researchers to replicate others’ work to the benefit of all.
  • Liveplasma — An interactive search engine, Liveplasma lets users listen to music and view a web-like visualization of similar songs and artists, seeing how they are related and enabling discovery. Content from YouTube is streamed into the data visualizations.
  • Provenance — The England-based online platform lets users trace the origin and history of a product, also providing its manufacturing information. The mission is to encourage transparency in the practices of the corporations that produce the products we all use.

These examples demonstrate open data’s reach, value and impact well beyond the public sector. As open data continues to be put to wider use, the results will not be limited to increased efficiency and reduced wasteful spending in government, but will also create economic growth and jobs due to the products and services using the information as a foundation.

However, in the end, it won’t be the data alone that solves issues. Rather, it will be dependent on individual citizens, developers and organizations to see the possibilities, take up the call to arms and use this available data to introduce changes that make our world better….(More)”

Time for sharing data to become routine: the seven excuses for not doing so are all invalid


Paper by Richard Smith and Ian Roberts: “Data are more valuable than scientific papers but researchers are incentivised to publish papers not share data. Patients are the main beneficiaries of data sharing but researchers have several incentives not to share: others might use their data to get ahead in the academic rat race; they might be scooped; their results might not be replicable; competitors may reach different conclusions; their data management might be exposed as poor; patient confidentiality might be breached; and technical difficulties make sharing impossible. All of these barriers can be overcome and researchers should be rewarded for sharing data. Data sharing must become routine….(More)”

If you build it… will they come?


Laura Bacon at Omidyar Network: “What do datasets on Danish addresses, Indonesian elections, Singapore Dengue Fever, Slovakian contracts, Uruguayan health service provision, and Global weather systems have in common? Read on to learn more…

On May 12, 2016, more than 40 nations’ leaders gathered in London for an Anti-Corruption Summit, convened by UK Prime Minister David Cameron. Among the commitments made, 40 countries pledged to make their procurement processes open by default, with 14 countries specifically committing to publish to the Open Contracting Data Standard.

This conference and these commitments can be seen as part of a larger global norm toward openness and transparency, also embodied by the Open Government Partnership, Open Data Charter, and increasing numbers of Open Data Portals.

As government data is increasingly published openly in the public domain, valid questions have been raised about what impact the data will have: As governments release this data, will it be accessed and used? Will it ultimately improve lives, root out corruption, hold answers to seemingly intractable problems, and lead to economic growth?*

Omidyar Network — having supported several Open Data organizations and platforms such as Open Data Institute, Open Knowledge, and Web Foundation — sought data-driven answers to these questions. After a public call for proposals, we selected NYU’s GovLab to conduct research on the impact open data has already had. Not the potential or prospect of impact, but past proven impact. The GovLab research team, led by Stefaan Verhulst, investigated a variety of sectors — health, education, elections, budgets, contracts, etc. — in a variety of locations, spanning five continents.

Their findings are promising and exciting, demonstrating that open data is changing the world by empowering people, improving governance, solving public problems, and leading to innovation. A summary is contained in thisKey Findings report, and is accompanied by many open data case studies posted in this Open Data Impact Repository.

Of course, stories such as this are not 100% rosy, and the report is clear about the challenges ahead. There are plenty of cases in which open data has had minimal impact. There are cases where there was negative impact. And there are obstacles to open data reaching its full potential: namely, open data projects that don’t respond to citizens’ questions and needs, a lack of technical capacity on either the data provider and data user side, inadequate protections for privacy and security, and a shortage of resources.

But this research holds good news: Danish addresses, Indonesian elections,Singapore Dengue Fever, Slovakian contracts, Uruguayan health service provision, Global weather systems, and others were all opened up. And all changed the world by empowering citizens, improving governance, solving public problems, and leading to innovation. Please see this report for more….(More)”

See also odimpact.org

Open data + increased disclosure = better public-private partnerships


David Bloomgarden and Georg Neumann at Fomin Blog: “The benefits of open and participatory public procurement are increasingly being recognized by international bodies such as the Group of 20 major economies, the Organisation for Economic Co-operation and Development, and multilateral development banks. Value for money, more competition, and better goods and services for citizens all result from increased disclosure of contract data. Greater openness is also an effective tool to fight fraud and corruption.

However, because public-private partnerships (PPPs) are planned during a long timeframe and involve a large number of groups, therefore, implementing greater levels of openness in disclosure is complicated. This complexity can be a challenge to good design. Finding a structured and transparent approach to managing PPP contract data is fundamental for a project to be accepted and used by its local community….

In open contracting, all data is disclosed during the public procurement process—from the planning stage, to the bidding and awarding of the contract, to the monitoring of the implementation. A global open source data standard is used to publish that data, which is already being implemented in countries as diverse as Canada, Paraguay, and the Ukraine. Using open data throughout the contracting process provides opportunities to innovate in managing bids, fixing problems, and integrating feedback as needed. Open contracting contributes to the overall social and environmental sustainability of infrastructure investments.

In the case of Mexico’s airport, the project publishes details of awarded contracts, including visualizing the flow of funds and detailing the full amounts of awarded contracts and renewable agreements. Standardized, timely, and open data that follow global standards such as the Open Contracting Data Standard will make this information useful for analysis of value for money, cost-benefit, sustainability, and monitoring performance. Crucially, open contracting will shift the focus from the inputs into a PPP, to the outputs: the goods and services being delivered.

Benefits of open data for PPPs

We think that better and open data will lead to better PPPs. Here’s how:

1. Using user feedback to fix problems

The Brazilian state of Minas Gerais has been a leader in transparent PPP contracts with full proactive disclosure of the contract terms, as well as of other relevant project information—a practice that puts a government under more scrutiny but makes for better projects in the long run.

According to Marcos Siqueira, former head of the PPP Unit in Minas Gerais, “An adequate transparency policy can provide enough information to users so they can become contract watchdogs themselves.”

For example, a public-private contract was signed in 2014 to build a $300 million waste treatment plant for 2.5 million people in the metropolitan area of Belo Horizonte, the capital of Minas Gerais. As the team members conducted appraisals, they disclosed them on the Internet. In addition, the team held around 20 public meetings and identified all the stakeholders in the project. One notable result of the sharing and discussion of this information was the relocation of the facility to a less-populated area. When the project went to the bidding phase, it was much closer to the expectations of its various stakeholders.

2. Making better decisions on contracts and performance

Chile has been a leader in developing PPPs (which it refers to as concessions) for several decades, in a range of sectors: urban and inter-urban roads, seaports, airports, hospitals, and prisons. The country tops the list for the best enabling environment for PPPs in Latin America and the Caribbean, as measured by Infrascope, an index produced by the Economist Intelligence Unit and the Multilateral Investment Fund of the IDB Group.

Chile’s distinction is that it discloses information on performance of PPPs that are underway. The government’s Concessions Unit regularly publishes summaries of the projects during their different phases, including construction and operation. The reports are non-technical, yet include all the necessary information to understand the scope of the project…(More)”