From Idea to Reality: Why We Need an Open Data Policy Lab


Stefaan G. Verhulst at Open Data Policy Lab: “The belief that we are living in a data age — one characterized by unprecedented amounts of data, with unprecedented potential — has become mainstream. We regularly read phrases such as “data is the most valuable commodity in the global economy” or that data provides decision-makers with an “ever-swelling flood of information.”

Without a doubt, there is truth in such statements. But they also leave out a major shortcoming — the fact that much of the most useful data continue to remain inaccessible, hidden in silos, behind digital walls, and in untapped “treasuries.”

For close to a decade, the technology and public interest community have pushed the idea of open data. At its core, open data represents a new paradigm of data availability and access. The movement borrows from the language of open source and is rooted in notions of a “knowledge commons”, a concept developed, among others, by scholars like Nobel Prize winner Elinor Ostrom.

Milestones and Limitations in Open Data

Significant milestones have been achieved in the short history of the open data movement. Around the world, an ever-increasing number of governments at the local, state and national levels now release large datasets for the public’s benefit. For example, New York City requires that all public data be published on a single web portal. The current portal site contains thousands of datasets that fuel projects on topics as diverse as school bullying, sanitation, and police conduct. In California, the Forest Practice Watershed Mapper allows users to track the impact of timber harvesting on aquatic life through the use of the state’s open data. Similarly, Denmark’s Building and Dwelling Register releases address data to the public free of charge, improving transparent property assessment for all interested parties.

A growing number of private companies have also initiated or engaged in “Data Collaborative”projects to leverage their private data toward the public interest. For example, Valassis, a direct-mail marketing company, shared its massive address database with community groups in New Orleans to visualize and track block-by-block repopulation rates after Hurricane Katrina. A wide number of data collaboratives are also currently being launched to respond to the COVID-19 pandemic. Through its COVID-19 Data Collaborative Program, the location-intelligence company Cuebiq is providing researchers access to the company’s data to study, for instance, the impacts of social distancing policies in Italy and New York City. The health technology company Kinsa Health’s US Health Weather initiative is likewise visualizing the rate of fever across the United States using data from its network of Smart Thermometers, thereby providing early indications regarding the location of likely COVID-19 outbreaks.

Yet despite such initiatives, many open data projects (and data collaboratives) remain fledgling — especially those at the state and local level.

Among other issues, the field has trouble scaling projects beyond initial pilots, and many potential stakeholders — private sector and government “owners” of data, as well as public beneficiaries — remain skeptical of open data’s value. In addition, terabytes of potentially transformative data remain inaccessible for re-use. It is absolutely imperative that we continue to make the case to all stakeholders regarding the importance of open data, and of moving it from an interesting idea to an impactful reality. In order to do this, we need a new resource — one that can inform the public and data owners, and that would guide decision-makers on how to achieve open data in a responsible manner, without undermining privacy and other rights.

Purpose of the Open Data Policy Lab

Today, with support from Microsoft and under the counsel of a global advisory board of open data leaders, The GovLab is launching an initiative designed precisely to build such a resource.

Our Open Data Policy Lab will draw on lessons and experiences from around the world to conduct analysis, provide guidance, build community, and take action to accelerate the responsible re-use and opening of data for the benefit of society and the equitable spread of economic opportunity…(More)”.

‘For good measure’: data gaps in a big data world


Paper by Sarah Giest & Annemarie Samuels: “Policy and data scientists have paid ample attention to the amount of data being collected and the challenge for policymakers to use and utilize it. However, far less attention has been paid towards the quality and coverage of this data specifically pertaining to minority groups. The paper makes the argument that while there is seemingly more data to draw on for policymakers, the quality of the data in combination with potential known or unknown data gaps limits government’s ability to create inclusive policies. In this context, the paper defines primary, secondary, and unknown data gaps that cover scenarios of knowingly or unknowingly missing data and how that is potentially compensated through alternative measures.

Based on the review of the literature from various fields and a variety of examples highlighted throughout the paper, we conclude that the big data movement combined with more sophisticated methods in recent years has opened up new opportunities for government to use existing data in different ways as well as fill data gaps through innovative techniques. Focusing specifically on the representativeness of such data, however, shows that data gaps affect the economic opportunities, social mobility, and democratic participation of marginalized groups. The big data movement in policy may thus create new forms of inequality that are harder to detect and whose impact is more difficult to predict….(More)“.

Tear down this wall: Microsoft embraces open data


The Economist: “Two decades ago Microsoft was a byword for a technological walled garden. One of its bosses called free open-source programs a “cancer”. That was then. On April 21st the world’s most valuable tech firm joined a fledgling movement to liberate the world’s data. Among other things, the company plans to launch 20 data-sharing groups by 2022 and give away some of its digital information, including data it has aggregated on covid-19.

Microsoft is not alone in its newfound fondness for sharing in the age of the coronavirus. “The world has faced pandemics before, but this time we have a new superpower: the ability to gather and share data for good,” Mark Zuckerberg, the boss of Facebook, a social-media conglomerate, wrote in the Washington Post on April 20th. Despite the EU’s strict privacy rules, some Eurocrats now argue that data-sharing could speed up efforts to fight the coronavirus. 

But the argument for sharing data is much older than the virus. The OECD, a club mostly of rich countries, reckons that if data were more widely exchanged, many countries could enjoy gains worth between 1% and 2.5% of GDP. The estimate is based on heroic assumptions (such as putting a number on business opportunities created for startups). But economists agree that readier access to data is broadly beneficial, because data are “non-rivalrous”: unlike oil, say, they can be used and re-used without being depleted, for instance to power various artificial-intelligence algorithms at once. 

Many governments have recognised the potential. Cities from Berlin to San Francisco have “open data” initiatives. Companies have been cagier, says Stefaan Verhulst, who heads the Governance Lab at New York University, which studies such things. Firms worry about losing intellectual property, imperilling users’ privacy and hitting technical obstacles. Standard data formats (eg, JPEG images) can be shared easily, but much that a Facebook collects with its software would be meaningless to a Microsoft, even after reformatting. Less than half of the 113 “data collaboratives” identified by the lab involve corporations. Those that do, including initiatives by BBVA, a Spanish bank, and GlaxoSmithKline, a British drugmaker, have been small or limited in scope. 

Microsoft’s campaign is the most consequential by far. Besides encouraging more non-commercial sharing, the firm is developing software, licences and (with the Governance Lab and others) governance frameworks that permit firms to trade data or provide access to them without losing control. Optimists believe that the giant’s move could be to data what IBM’s embrace in the late 1990s of the Linux operating system was to open-source software. Linux went on to become a serious challenger to Microsoft’s own Windows and today underpins Google’s Android mobile software and much of cloud-computing…(More)”.

The Atlas of Inequality and Cuebiq’s Data for Good Initiative


Data Collaborative Case Study by Michelle Winowatan, Andrew Young, and Stefaan Verhulst: “The Atlas of Inequality is a research initiative led by scientists at the MIT Media Lab and Universidad Carlos III de Madrid. It is a project within the larger Human Dynamics research initiative at the MIT Media Lab, which investigates how computational social science can improve society, government, and companies. Using multiple big data sources, MIT Media Lab researchers seek to understand how people move in urban spaces and how that movement influences or is influenced by income. Among the datasets used in this initiative was location data provided by Cuebiq, through its Data for Good initiative. Cuebiq offers location-intelligence services to approved research and nonprofit organizations seeking to address public problems. To date, the Atlas has published maps of inequality in eleven cities in the United States. Through the Atlas, the researchers hope to raise public awareness about segregation of social mobility in United States cities resulting from economic inequality and support evidence-based policymaking to address the issue.

Data Collaborative Model: Based on the typology of data collaborative practice areas developed by The GovLab, the use of Cuebiq’s location data by MIT Media Lab researchers for the Atlas of Inequality initiative is an example of the research and analysis partnership model of data collaboration, specifically a data transfer approach. In this approach, companies provide data to partners for analysis, sometimes under the banner of “data philanthropy.” Access to data remains highly restrictive, with only specific partners able to analyze the assets provided. Approved uses are also determined in a somewhat cooperative manner, often with some agreement outlining how and why parties requesting access to data will put it to use….(More)”.

The Economics of Maps


Abhishek Nagaraj and Scott Stern in the Journal of Economic Perspectives: “For centuries, maps have codified the extent of human geographic knowledge and shaped discovery and economic decision-making. Economists across many fields, including urban economics, public finance, political economy, and economic geography, have long employed maps, yet have largely abstracted away from exploring the economic determinants and consequences of maps as a subject of independent study. In this essay, we first review and unify recent literature in a variety of different fields that highlights the economic and social consequences of maps, along with an overview of the modern geospatial industry. We then outline our economic framework in which a given map is the result of economic choices around map data and designs, resulting in variations in private and social returns to mapmaking. We highlight five important economic and institutional factors shaping mapmakers’ data and design choices. Our essay ends by proposing that economists pay more attention to the endogeneity of mapmaking and the resulting consequences for economic and social welfare…(More)”.

The many perks of using critical consumer user data for social benefit


Sushant Kumar at LiveMint: “Business models that thrive on user data have created profitable global technology companies. For comparison, market capitalization of just three tech companies, Google (Alphabet), Facebook and Amazon, combined is higher than the total market capitalization of all listed firms in India. Almost 98% of Facebook’s revenue and 84% of Alphabet’s come from serving targeted advertising powered by data collected from the users. No doubt, these tech companies provide valuable services to consumers. It is also true that profits are concentrated with private corporations and societal value for contributors of data, that is, the user, can be much more significant….

In the existing economic construct, private firms are able to deploy top scientists and sophisticated analytical tools to collect data, derive value and monetize the insights.

Imagine if personalization at this scale was available for more meaningful outcomes, such as for administering personalized treatment for diabetes, recommending crop patterns, optimizing water management and providing access to credit to the unbanked. These socially beneficial applications of data can generate undisputedly massive value.

However, handling critical data with accountability to prevent misuse is a complex and expensive task. What’s more, private sector players do not have any incentives to share the data they collect. These challenges can be resolved by setting up specialized entities that can manage data—collect, analyse, provide insights, manage consent and access rights. These entities would function as a trusted intermediary with public purpose, and may be named “data stewards”….(More)”.

See also: http://datastewards.net/ and https://datacollaboratives.org/

Urban Poverty Alleviation Endeavor Through E-Warong Program: Smart City (Smart People) Concept Initiative in Yogyakarta


Paper by Djaka Marwasta and Farid Suprianto: “In the era of Industrial Revolution 4.0, technology became a factor that could contribute significantly to improving the quality of life and welfare of the people of a nation. Information and Communication Technology (ICT) penetration through Internet of Things (IoT), Big Data, and Artificial Intelligence (AI) which are disruptively, has led to fundamental advances in civilization. The expansion of Industrial Revolution 4.0 has also changed the pattern of government and citizen relations which has implications for the needs of policy governance and internal government transformation. One of them is a change in social welfare development policies, where government officials are required to be responsive to social dynamics that have consequences for increasing demands for public accountability and transparency.

This paper aims to elaborate on the e-Warong program as one of the breakthroughs to reduce poverty by utilizing digital technology. E-Warong (electronic mutual cooperation shop) is an Indonesian government program based on the empowerment of the poor Grass Root Innovation (GRI) with an approach to building group awareness in encouraging the independence of the poor to develop joint ventures through mutual cooperation with utilizing ICT advantages. This program is an implementation of the Smart City concept, especially Smart Economy, within the Sustainable Development Goals framework….(More)”.

Reuse of open data in Quebec: from economic development to government transparency


Paper by

Reuse of open data in Quebec: from economic development to government transparency

Paper by Christian Boudreau: “Based on the history of open data in Quebec, this article discusses the reuse of these data by various actors within society, with the aim of securing desired economic, administrative and democratic benefits. Drawing on an analysis of government measures and community practices in the field of data reuse, the study shows that the benefits of open data appear to be inconclusive in terms of economic growth. On the other hand, their benefits seem promising from the point of view of government transparency in that it allows various civil society actors to monitor the integrity and performance of government activities. In the age of digital data and networks, the state must be seen not only as a platform conducive to innovation, but also as a rich field of study that is closely monitored by various actors driven by political and social goals….

Although the economic benefits of open data have been inconclusive so far, governments, at least in Quebec, must not stop investing in opening up their data. In terms of transparency, the results of the study suggest that the benefits of open data are sufficiently promising to continue releasing government data, if only to support the evaluation and planning activities of public programmes and services….(More)”.

Data as infrastructure? A study of data sharing legal regimes


Paper by Charlotte Ducuing: “The article discusses the concept of infrastructure in the digital environment, through a study of three data sharing legal regimes: the Public Sector Information Directive (PSI Directive), the discussions on in-vehicle data governance and the freshly adopted data sharing legal regime in the Electricity Directive.

While aiming to contribute to the scholarship on data governance, the article deliberately focuses on network industries. Characterised by the existence of physical infrastructure, they have a special relationship to digitisation and ‘platformisation’ and are exposed to specific risks. Adopting an explanatory methodology, the article exposes that these regimes are based on two close but different sources of inspiration, yet intertwined and left unclear. By targeting entities deemed ‘monopolist’ with regard to the data they create and hold, data sharing obligations are inspired from competition law and especially the essential facility doctrine. On the other hand, beneficiaries appear to include both operators in related markets needing data to conduct their business (except for the PSI Directive), and third parties at large to foster innovation. The latter rationale illustrates what is called here a purposive view of data as infrastructure. The underlying understanding of ‘raw’ data (management) as infrastructure for all to use may run counter the ability for the regulated entities to get a fair remuneration for ‘their’ data.

Finally, the article pleads for more granularity when mandating data sharing obligations depending upon the purpose. Shifting away from a ‘one-size-fits-all’ solution, the regulation of data could also extend to the ensuing context-specific data governance regime, subject to further research…(More)”.

What is My Data Worth?


Ruoxi Jia at Berkeley artificial intelligence research: “People give massive amounts of their personal data to companies every day and these data are used to generate tremendous business values. Some economists and politicians argue that people should be paid for their contributions—but the million-dollar question is: by how much?

This article discusses methods proposed in our recent AISTATS and VLDB papers that attempt to answer this question in the machine learning context. This is joint work with David Dao, Boxin Wang, Frances Ann Hubis, Nezihe Merve Gurel, Nick Hynes, Bo Li, Ce Zhang, Costas J. Spanos, and Dawn Song, as well as a collaborative effort between UC Berkeley, ETH Zurich, and UIUC. More information about the work in our group can be found here.

What are the existing approaches to data valuation?

Various ad-hoc data valuation schemes have been studied in the literature and some of them have been deployed in the existing data marketplaces. From a practitioner’s point of view, they can be grouped into three categories:

  • Query-based pricing attaches values to user-initiated queries. One simple example is to set the price based on the number of queries allowed during a time window. Other more sophisticated examples attempt to adjust the price to some specific criteria, such as arbitrage avoidance.
  • Data attribute-based pricing constructs a price model that takes into account various parameters, such as data age, credibility, potential benefits, etc. The model is trained to match market prices released in public registries.
  • Auction-based pricing designs auctions that dynamically set the price based on bids offered by buyers and sellers.

However, existing data valuation schemes do not take into account the following important desiderata:

  • Task-specificness: The value of data depends on the task it helps to fulfill. For instance, if Alice’s medical record indicates that she has disease A, then her data will be more useful to predict disease A as opposed to other diseases.
  • Fairness: The quality of data from different sources varies dramatically. In the worst-case scenario, adversarial data sources may even degrade model performance via data poisoning attacks. Hence, the data value should reflect the efficacy of data by assigning high values to data which can notably improve the model’s performance.
  • Efficiency: Practical machine learning tasks may involve thousands or billions of data contributors; thus, data valuation techniques should be capable of scaling up.

With the desiderata above, we now discuss a principled notion of data value and computationally efficient algorithms for data valuation….(More)”.