Data Collaboratives: Sharing Public Data in Private Hands for Social Good

Beth Simone Noveck (The GovLab) in Forbes: “Sensor-rich consumer electronics such as mobile phones, wearable devices, commercial cameras and even cars are collecting zettabytes of data about the environment and about us. According to one McKinsey study, the volume of data is growing at fifty percent a year. No one needs convincing that these private storehouses of information represent a goldmine for business, but these data can do double duty as rich social assets—if they are shared wisely.

Think about a couple of recent examples: Sharing data held by businesses and corporations (i.e. public data in private hands) can help to improve policy interventions. California planners make water allocation decisions based upon expertise, data and analytical tools from public and private sources, including Intel, the Earth Research Institute at the University of California at Santa Barbara, and the World Food Center at the University of California at Davis.

In Europe, several phone companies have made anonymized datasets available, making it possible for researchers to track calling and commuting patterns and gain better insight into social problems from unemployment to mental health. In the United States, LinkedIn is providing free data about demand for IT jobs in different markets which, when combined with open data from the Department of Labor, helps communities target efforts around training….

Despite the promise of data sharing, these kind of data collaboratives remain relatively new. There is a need toaccelerate their use by giving companies strong tax incentives for sharing data for public good. There’s a need for more study to identify models for data sharing in ways that respect personal privacy and security and enable companies to do well by doing good. My colleagues at The GovLab together with UN Global Pulse and the University of Leiden, for example, published this initial analysis of terms and conditions used when exchanging data as part of a prize-backed challenge. We also need philanthropy to start putting money into “meta research;” it’s not going to be enough to just open up databases: we need to know if the data is good.

After years of growing disenchantment with closed-door institutions, the push for greater use of data in governing can be seen as both a response and as a mirror to the Big Data revolution in business. Although more than 1,000,000 government datasets about everything from air quality to farmers markets are openly available online in downloadable formats, much of the data about environmental, biometric, epidemiological, and physical conditions rest in private hands. Governing better requires a new empiricism for developing solutions together. That will depend on access to these private, not just public data….(More)”

(US) Administration Announces New “Smart Cities” Initiative to Help Communities Tackle Local Challenges and Improve City Services

Factsheet from the White House: “Today, the Administration is announcing a new “Smart Cities” Initiative that will invest over $160 million in federal research and leverage more than 25 new technology collaborations to help local communities tackle key challenges such as reducing traffic congestion, fighting crime, fostering economic growth, managing the effects of a changing climate, and improving the delivery of city services. The new initiative is part of this Administration’s overall commitment to target federal resources to meet local needs and support community-led solutions.

Over the past six years, the Administration has pursued a place-based approach to working with communities as they tackle a wide range of challenges, from investing in infrastructure and filling open technology jobs to bolstering community policing. Advances in science and technology have the potential to accelerate these efforts. An emerging community of civic leaders, data scientists, technologists, and companies are joining forces to build “Smart Cities” – communities that are building an infrastructure to continuously improve the collection, aggregation, and use of data to improve the life of their residents – by harnessing the growing data revolution, low-cost sensors, and research collaborations, and doing so securely to protect safety and privacy.

As part of the initiative, the Administration is announcing:

  • More than $35 million in new grants and over $10 million in proposed investments to build a research infrastructure for Smart Cities by the National Science Foundation and National Institute of Standards and Technology.
  • Nearly $70 million in new spending and over $45 million in proposed investments to unlock new solutions in safety, energy, climate preparedness, transportation, health and more, by the Department of Homeland Security, Department of Transportation, Department of Energy, Department of Commerce, and the Environmental Protection Agency.
  • More than 20 cities participating in major new multi-city collaborations that will help city leaders effectively collaborate with universities and industry.

Today, the Administration is also hosting a White House Smart Cities Forum, coinciding with Smart Cities Week hosted by the Smart Cities Council, to highlight new steps and brainstorm additional ways that science and technology can support municipal efforts.

The Administration’s Smart Cities Initiative will begin with a focus on key strategies:

  • Creating test beds for “Internet of Things” applications and developing new multi-sector collaborative models: Technological advancements and the diminishing cost of IT infrastructure have created the potential for an “Internet of Things,” a ubiquitous network of connected devices, smart sensors, and big data analytics. The United States has the opportunity to be a global leader in this field, and cities represent strong potential test beds for development and deployment of Internet of Things applications. Successfully deploying these and other new approaches often depends on new regional collaborations among a diverse array of public and private actors, including industry, academia, and various public entities.
  • Collaborating with the civic tech movement and forging intercity collaborations: There is a growing community of individuals, entrepreneurs, and nonprofits interested in harnessing IT to tackle local problems and work directly with city governments. These efforts can help cities leverage their data to develop new capabilities. Collaborations across communities are likewise indispensable for replicating what works in new places.
  • Leveraging existing Federal activity: From research on sensor networks and cybersecurity to investments in broadband infrastructure and intelligent transportation systems, the Federal government has an existing portfolio of activities that can provide a strong foundation for a Smart Cities effort.
  • Pursuing international collaboration: Fifty-four percent of the world’s population live in urban areas. Continued population growth and urbanization will add 2.5 billion people to the world’s urban population by 2050. The associated climate and resource challenges demand innovative approaches. Products and services associated with this market present a significant export opportunity for the U.S., since almost 90 percent of this increase will occur in Africa and Asia.

Complementing this effort, the President’s Council of Advisors on Science and Technology is examining how a variety of technologies can enhance the future of cities and the quality of life for urban residents. The Networking and Information Technology Research and Development (NITRD) Program is also announcing the release of a new framework to help coordinate Federal agency investments and outside collaborations that will guide foundational research and accelerate the transition into scalable and replicable Smart City approaches. Finally, the Administration’s growing work in this area is reflected in the Science and Technology Priorities Memo, issued by the Office of Management and Budget and Office of Science and Technology Policy in preparation for the President’s 2017 budget proposal, which includes a focus on cyber-physical systems and Smart Cities….(More)”

The impact of Open Data

GovLab/Omidyar Network: “…share insights gained from our current collaboration with Omidyar Network on a series of open data case studies. These case studies – 19, in total – are designed to provide a detailed examination of the various ways open data is being used around the world, across geographies and sectors, and to draw some over-arching lessons. The case studies are built from extensive research, including in-depth interviews with key participants in the various open data projects under study….

Ways in which open data impacts lives

Broadly, we have identified four main ways in which open data is transforming economic, social, cultural and political life, and hence improving people’s lives.

  • First, open data is improving government, primarily by helping tackle corruption, improving transparency, and enhancing public services and resource allocation.
  • Open data is also empowering citizens to take control of their lives and demand change; this dimension of impact is mediated by more informed decision making and new forms of social mobilization, both facilitated by new ways of communicating and accessing information.
  • Open data is also creating new opportunities for citizens and groups, by stimulating innovation and promoting economic growth and development.
  • Finally, open data is playing an increasingly important role insolving big public problems, primarily by allowing citizens and policymakers to engage in new forms of data-driven assessment and data-driven engagement.


Enabling Conditions

While these are the four main ways in which open data is driving change, we have seen wide variability in the amount and nature of impact across our case studies. Put simply, some projects are more successful than others; or some projects might be more successful in a particular dimension of impact, and less successful in others.

As part of our research, we have therefore tried to identify some enabling conditions that maximize the positive impact of open data projects. These four stand out:

  • Open data projects are most successful when they are built not from the efforts of single organizations or government agencies, but when they emerge from partnerships across sectors (and even borders). The role of intermediaries (e.g., the media and civil society groups) and “data collaboratives” are particularly important.
  • Several of the projects we have seen have emerged on the back of what we might think of as an open data public infrastructure– i.e., the technical backend and organizational processes necessary to enable the regular release of potentially impactful data to the public.
  • Clear open data policies, including well-defined performance metrics, are also essential; policymakers and political leaders have an important role in creating an enabling (yet flexible) legal environment that includes mechanisms for project assessments and accountability, as well as providing the type high-level political buy-in that can empower practitioners to work with open data.
  • We have also seen that the most successful open data projects tend to be those that target a well-defined problem or issue. In other words, projects with maximum impact often meet a genuine citizen need.



Impact is also determined by the obstacles and challenges that a project confronts. Some regions and some projects face a greater number of hurdles. These also vary, but we have found four challenges that appear most often in our case studies:

  • Projects in countries or regions with low capacity or “readiness”(indicated, for instance by low Internet penetration rates or hostile political environments) typically fare less well.
  • Projects that are unresponsive to feedback and user needs are less likely to succeed than those that are flexible and able to adapt to what their users want.
  • Open data often exists in tension with risks such as privacy and security; often, the impact of a project is limited or harmed when it fails to take into account and mitigate these risks.
  • Although open data projects are often “hackable” and cheap to get off the ground, the most successful do require investments – of time and money – after their launch; inadequate resource allocation is one of the most common reasons for a project to fail.

These lists of impacts, enabling factors and challenges are, of course, preliminary. We continue to refine our research and will include a final set of findings along with our final report….(More)

On the Farm: Startups Put Data in Farmers’ Hands

Jacob Bunge at the Wall Street Journal: “Farmers and entrepreneurs are starting to compete with agribusiness giants over the newest commodity being harvested on U.S. farms—one measured in bytes, not bushels.

Startups including Farmobile LLC, Granular Inc. and Grower Information Services Cooperative are developing computer systems that will enable farmers to capture data streaming from their tractors and combines, store it in digital silos and market it to agriculture companies or futures traders. Such platforms could allow farmers to reap larger profits from a technology revolution sweeping the U.S. Farm Belt and give them more control over the information generated on their fields.

The efforts in some cases would challenge a wave of data-analysis tools from big agricultural companies such as Monsanto Co., DuPontCo., Deere & Co. and Cargill Inc. Those systems harness modern planters, combines and other machinery outfitted with sensors to track planting, spraying and harvesting, then crunch that data to provide farm-management guidance that these firms say can help farmers curb costs and grow larger crops. The companies say farmers own their data, and it won’t be sold to third parties.

Some farmers and entrepreneurs say crop producers can get the most from their data by compiling and analyzing it themselves—for instance, to determine the best time to apply fertilizer to their soil and how much. Then, farmers could profit further by selling data to seed, pesticide and equipment makers seeking a glimpse into how and when farmers use machinery and crop supplies.

The new ventures come as farmers weigh the potential benefits of sharing their data with large agricultural firms against privacy concerns and fears that agribusinesses could leverage farm-level information to charge higher rates for seeds, pesticides and other supplies.

“We need to get farmers involved in this because it’s their information,” said Dewey Hukill, board president of Grower Information Services Cooperative, or GISC, a farmer-owned cooperative that is building a platform to collect its members’ data. The cooperative has signed up about 1,500 members across 37 states….

Companies developing markets for farm data say it’s not their intention to displace big seed and machinery suppliers but to give farmers a platform that would enable them to manage their own information. Storing and selling their own data wouldn’t necessarily bar a farmer from sharing information with a seed company to get a planting recommendation, they say….(More)”


A data revolution is underway. Will NGOs miss the boat?

Opinion by Sophia Ayele at Oxfam: “The data revolution has arrived. ….The UN has even launched a Data Revolution Group (to ensure that the revolution penetrates into international development). The Group’s 2014 report suggests that harnessing the power of newly available data could ultimately lead to, “more empowered people, better policies, better decisions and greater participation and accountability, leading to better outcomes for people and the planet.”

But where do NGOs fit in?

NGOs are generating dozens (if not hundreds) of datasets every year. Over the last two decades, NGO have been collecting increasing amounts of research and evaluation data, largely driven by donor demands for more rigorous evaluations of programs. The quality and efficiency of data collection has also been enhanced by mobile data collection. However, a quick scan of UK development NGOs reveals that few, if any, are sharing the data that they collect. This means that NGOs are generating dozens (if not hundreds) of datasets every year that aren’t being fully exploited and analysed. Working on tight budgets, with limited capacity, it’s not surprising that NGOs often shy away from sharing data without a clear mandate.

But change is in the air. Several donors have begun requiring NGOs to publicise data and others appear to be moving in that direction. Last year, USAID launched its Open Data Policy which requires that grantees “submit any dataset created or collected with USAID funding…” Not only does USAID stipulate this requirement, it also hosts this data on its Development Data Library (DDL) and provides guidance on anonymisation to depositors. Similarly, Gates Foundation’s 2015 Open Access Policy stipulates that, “Data underlying published research results will be accessible and open immediately.” However, they are allowing a two-year transition period…..Here at Oxfam, we have been exploring ways to begin sharing research and evaluation data. We aren’t being required to do this – yet – but, we realise that the data that we collect is a public good with the potential to improve lives through more effective development programmes and to raise the voices of those with whom we work. Moreover, organizations like Oxfam can play a crucial role in highlighting issues facing women and other marginalized communities that aren’t always captured in national statistics. Sharing data is also good practice and would increase our transparency and accountability as an organization.

… the data that we collect is a public good with the potential to improve lives. However, Oxfam also bears a huge responsibility to protect the rights of the communities that we work with. This involves ensuring informed consent when gathering data, so that communities are fully aware that their data may be shared, and de-identifying data to a level where individuals and households cannot be easily identified.

As Oxfam has outlined in our, recently adopted, Responsible Data Policy,”Using data responsibly is not just an issue of technical security and encryption but also of safeguarding the rights of people to be counted and heard, ensuring their dignity, respect and privacy, enabling them to make an informed decision and protecting their right to not be put at risk… (More)”

Anonymization and Risk

Paper by Ira Rubinstein and Woodrow Hartzog: “Perfect anonymization of data sets has failed. But the process of protecting data subjects in shared information remains integral to privacy practice and policy. While the deidentification debate has been vigorous and productive, there is no clear direction for policy. As a result, the law has been slow to adapt a holistic approach to protecting data subjects when data sets are released to others. Currently, the law is focused on whether an individual can be identified within a given set. We argue that the better locus of data release policy is on the process of minimizing the risk of reidentification and sensitive attribute disclosure. Process-based data release policy, which resembles the law of data security, will help us move past the limitations of focusing on whether data sets have been “anonymized.” It draws upon different tactics to protect the privacy of data subjects, including accurate deidentification rhetoric, contracts prohibiting reidentification and sensitive attribute disclosure, data enclaves, and query-based strategies to match required protections with the level of risk. By focusing on process, data release policy can better balance privacy and utility where nearly all data exchanges carry some risk….(More)”

Meaningful Consent: The Economics of Privity in Networked Environments

Paper by Jonathan Cave: “Recent work on privacy (e.g. WEIS 2013/4, Meaningful Consent in the Digital Economy project) recognises the unanticipated consequences of data-centred legal protections in a world of shifting relations between data and human actors. But the rules have not caught up with these changes, and the irreversible consequences of ‘make do and mend’ are not often taken into account when changing policy.

Many of the most-protected ‘personal’ data are not personal at all, but are created to facilitate the operation of larger (e.g. administrative, economic, transport) systems or inadvertently generated by using such systems. The protection given to such data typically rests on notions of informed consent even in circumstances where such consent may be difficult to define, harder to give and nearly impossible to certify in meaningful ways. Such protections typically involve a mix of data collection, access and processing rules that are either imposed on behalf of individuals or are to be exercised by them. This approach adequately protects some personal interests, but not all – and is definitely not future-proof. Boundaries between allowing individuals to discover and pursue their interests on one side and behavioural manipulation on the other are often blurred. The costs (psychological and behavioural as well as economic and practical) of exercising control over one’s data are rarely taken into account as some instances of the Right to be Forgotten illustrate. The purposes for which privacy rights were constructed are often forgotten, or have not been reinterpreted in a world of ubiquitous monitoring data, multi-person ‘private exchanges,’ and multiple pathways through which data can be used to create and to capture value. Moreover, the parties who should be involved in making decisions – those connected by a network of informational relationships – are often not in contractual, practical or legal contact. These developments, associated with e.g. the Internet of Things, Cloud computing and big data analytics, should be recognised as challenging privacy rules and, more fundamentally, the adequacy of informed consent (e.g. to access specified data for specified purposes) as a means of managing innovative, flexible, and complex informational architectures.

This paper presents a framework for organising these challenges using them to evaluate proposed policies, specifically in relation to complex, automated, automatic or autonomous data collection, processing and use. It argues for a movement away from a system of property rights based on individual consent to a values-based ‘privity’ regime – a collection of differentiated (relational as well as property) rights and consents that may be better able to accommodate innovations. Privity regimes (see deFillipis 2006) bundle together rights regarding e.g. confidential disclosure with ‘standing’ or voice options in relation to informational linkages.

The impacts are examined through a game-theoretic comparison between the proposed privity regime and existing privacy rights in personal data markets that include: conventional ‘behavioural profiling’ and search; situations where third parties may have complementary roles conflicting interests in such data and where data have value in relation both to specific individuals and to larger groups (e.g. ‘real-world’ health data); n-sided markets on data platforms (including social and crowd-sourcing platforms with long and short memories); and the use of ‘privity-like’ rights inherited by data objects and by autonomous systems whose ownership may be shared among many people….(More)”

Journal of Technology Science

Technology Science is an open access forum for any original material dealing primarily with a social, political, personal, or organizational benefit or adverse consequence of technology. Studies that characterize a technology-society clash or present an approach to better harmonize technology and society are especially welcomed. Papers can come from anywhere in the world.

Technology Science is interested in reviews of research, experiments, surveys, tutorials, and analyses. Writings may propose solutions or describe unsolved problems. Technology Science may also publish letters, short communications, and relevant news items. All submissions are peer-reviewed.

The scientific study of technology-society clashes is a cross-disciplinary pursuit, so papers in Technology Science may come from any of many possible disciplinary traditions, including but not limited to social science, computer science, political science, law, economics, policy, or statistics.

The Data Privacy Lab at Harvard University publishes Technology Science and its affiliated subset of papers called the Journal of Technology Science and maintains them online at and at Technology Science is available free of charge over the Internet. While it is possible that bound paper copies of Technology Science content may be produced for a fee, all content will continue to be offered online at no charge….(More)”


Open Data: A 21st Century Asset for Small and Medium Sized Enterprises

“The economic and social potential of open data is widely acknowledged. In particular, the business opportunities have received much attention. But for all the excitement, we still know very little about how and under what conditions open data really works.

To broaden our understanding of the use and impact of open data, the GovLab has a variety of initiatives and studies underway. Today, we share publicly our findings on how Small and Medium Sized Enterprises (SMEs) are leveraging open data for a variety of purposes. Our paper “Open Data: A 21st Century Asset for Small and Medium Sized Enterprises” seeks to build a portrait of the lifecycle of open data—how it is collected, stored and used. It outlines some of the most important parameters of an open data business model for SMEs….

The paper analyzes ten aspects of open data and establishes ten principles for its effective use by SMEs. Taken together, these offer a roadmap for any SME considering greater use or adoption of open data in its business.

Among the key findings included in the paper:

  • SMEs, which often lack access to data or sophisticated analytical tools to process large datasets, are likely to be one of the chief beneficiaries of open data.
  • Government data is the main category of open data being used by SMEs. A number of SMEs are also using open scientific and shared corporate data.
  • Open data is used primarily to serve the Business-to-Business (B2B) markets, followed by the Business-to-Consumer (B2C) markets. A number of the companies studied serve two or three market segments simultaneously.
  • Open data is usually a free resource, but SMEs are monetizing their open-data-driven services to build viable businesses. The most common revenue models include subscription-based services, advertising, fees for products and services, freemium models, licensing fees, lead generation and philanthropic grants.
  • The most significant challenges SMEs face in using open data include those concerning data quality and consistency, insufficient financial and human resources, and issues surrounding privacy.

This is just a sampling of findings and observations. The paper includes a number of additional observations concerning business and revenue models, product development, customer acquisition, and other subjects of relevance to any company considering an open data strategy.”

Can big databases be kept both anonymous and useful?

The Economist: “….The anonymisation of a data record typically means the removal from it of personally identifiable information. Names, obviously. But also phone numbers, addresses and various intimate details like dates of birth. Such a record is then deemed safe for release to researchers, and even to the public, to make of it what they will. Many people volunteer information, for example to medical trials, on the understanding that this will happen.

But the ability to compare databases threatens to make a mockery of such protections. Participants in genomics projects, promised anonymity in exchange for their DNA, have been identified by simple comparison with electoral rolls and other publicly available information. The health records of a governor of Massachusetts were plucked from a database, again supposedly anonymous, of state-employee hospital visits using the same trick. Reporters sifting through a public database of web searches were able to correlate them in order to track down one, rather embarrassed, woman who had been idly searching for single men. And so on.

Each of these headline-generating stories creates a demand for more controls. But that, in turn, deals a blow to the idea of open data—that the electronic “data exhaust” people exhale more or less every time they do anything in the modern world is actually useful stuff which, were it freely available for analysis, might make that world a better place.

Of cake, and eating it

Modern cars, for example, record in their computers much about how, when and where the vehicle has been used. Comparing the records of many vehicles, says Viktor Mayer-Schönberger of the Oxford Internet Institute, could provide a solid basis for, say, spotting dangerous stretches of road. Similarly, an opening of health records, particularly in a country like Britain, which has a national health service, and cross-fertilising them with other personal data, might help reveal the multifarious causes of diseases like Alzheimer’s.

This is a true dilemma. People want both perfect privacy and all the benefits of openness. But they cannot have both. The stripping of a few details as the only means of assuring anonymity, in a world choked with data exhaust, cannot work. Poorly anonymised data are only part of the problem. What may be worse is that there is no standard for anonymisation. Every American state, for example, has its own prescription for what constitutes an adequate standard.

Worse still, devising a comprehensive standard may be impossible. Paul Ohm of Georgetown University, in Washington, DC, thinks that this is partly because the availability of new data constantly shifts the goalposts. “If we could pick an industry standard today, it would be obsolete in short order,” he says. Some data, such as those about medical conditions, are more sensitive than others. Some data sets provide great precision in time or place, others merely a year or a postcode. Each set presents its own dangers and requirements.

Fortunately, there are a few easy fixes. Thanks in part to the headlines, many now agree that public release of anonymised data is a bad move. Data could instead be released piecemeal, or kept in-house and accessible by researchers through a question-and-answer mechanism. Or some users could be granted access to raw data, but only in strictly controlled conditions.

All these approaches, though, are anathema to the open-data movement, because they limit the scope of studies. “If we’re making it so hard to share that only a few have access,” says Tim Althoff, a data scientist at Stanford University, “that has profound implications for science, for people being able to replicate and advance your work.”

Purely legal approaches might mitigate that. Data might come with what have been called “downstream contractual obligations”, outlining what can be done with a given data set and holding any onward recipients to the same standards. One perhaps draconian idea, suggested by Daniel Barth-Jones, an epidemiologist at Columbia University, in New York, is to make it illegal even to attempt re-identification….(More).”