From Idea to Reality: Why We Need an Open Data Policy Lab


Stefaan G. Verhulst at Open Data Policy Lab: “The belief that we are living in a data age — one characterized by unprecedented amounts of data, with unprecedented potential — has become mainstream. We regularly read phrases such as “data is the most valuable commodity in the global economy” or that data provides decision-makers with an “ever-swelling flood of information.”

Without a doubt, there is truth in such statements. But they also leave out a major shortcoming — the fact that much of the most useful data continue to remain inaccessible, hidden in silos, behind digital walls, and in untapped “treasuries.”

For close to a decade, the technology and public interest community have pushed the idea of open data. At its core, open data represents a new paradigm of data availability and access. The movement borrows from the language of open source and is rooted in notions of a “knowledge commons”, a concept developed, among others, by scholars like Nobel Prize winner Elinor Ostrom.

Milestones and Limitations in Open Data

Significant milestones have been achieved in the short history of the open data movement. Around the world, an ever-increasing number of governments at the local, state and national levels now release large datasets for the public’s benefit. For example, New York City requires that all public data be published on a single web portal. The current portal site contains thousands of datasets that fuel projects on topics as diverse as school bullying, sanitation, and police conduct. In California, the Forest Practice Watershed Mapper allows users to track the impact of timber harvesting on aquatic life through the use of the state’s open data. Similarly, Denmark’s Building and Dwelling Register releases address data to the public free of charge, improving transparent property assessment for all interested parties.

A growing number of private companies have also initiated or engaged in “Data Collaborative”projects to leverage their private data toward the public interest. For example, Valassis, a direct-mail marketing company, shared its massive address database with community groups in New Orleans to visualize and track block-by-block repopulation rates after Hurricane Katrina. A wide number of data collaboratives are also currently being launched to respond to the COVID-19 pandemic. Through its COVID-19 Data Collaborative Program, the location-intelligence company Cuebiq is providing researchers access to the company’s data to study, for instance, the impacts of social distancing policies in Italy and New York City. The health technology company Kinsa Health’s US Health Weather initiative is likewise visualizing the rate of fever across the United States using data from its network of Smart Thermometers, thereby providing early indications regarding the location of likely COVID-19 outbreaks.

Yet despite such initiatives, many open data projects (and data collaboratives) remain fledgling — especially those at the state and local level.

Among other issues, the field has trouble scaling projects beyond initial pilots, and many potential stakeholders — private sector and government “owners” of data, as well as public beneficiaries — remain skeptical of open data’s value. In addition, terabytes of potentially transformative data remain inaccessible for re-use. It is absolutely imperative that we continue to make the case to all stakeholders regarding the importance of open data, and of moving it from an interesting idea to an impactful reality. In order to do this, we need a new resource — one that can inform the public and data owners, and that would guide decision-makers on how to achieve open data in a responsible manner, without undermining privacy and other rights.

Purpose of the Open Data Policy Lab

Today, with support from Microsoft and under the counsel of a global advisory board of open data leaders, The GovLab is launching an initiative designed precisely to build such a resource.

Our Open Data Policy Lab will draw on lessons and experiences from around the world to conduct analysis, provide guidance, build community, and take action to accelerate the responsible re-use and opening of data for the benefit of society and the equitable spread of economic opportunity…(More)”.

EDPB Adopts Guidelines on the Processing of Health Data During COVID-19


Hunton Privacy Blog: “On April 21, 2020, the European Data Protection Board (“EDPB”) adopted Guidelines on the processing of health data for scientific purposes in the context of the COVID-19 pandemic. The aim of the Guidelines is to provide clarity on the most urgent matters relating to health data, such as legal basis for processing, the implementation of adequate safeguards and the exercise of data subject rights.

The Guidelines note that the General Data Protection Regulation (“GDPR”) provides a specific derogation to the prohibition on processing of sensitive data under Article 9, for scientific purposes. With respect to the legal basis for processing, the Guidelines state that consent may be relied on under both Article 6 and the derogation to the prohibition on processing under Article 9 in the context of COVID-19, as long as the requirements for explicit consent are met, and as long as there is no power imbalance that could pressure or disadvantage a reluctant data subject. Researchers should keep in mind that study participants must be able to withdraw their consent at any time. National legislation may also provide an appropriate legal basis for the processing of health data and a derogation to the Article 9 prohibition. Furthermore, national laws may restrict data subject rights, though these restrictions should apply only as is strictly necessary.

In the context of transfers to countries outside the European Economic Area that have not been deemed adequate by the European Commission, the Guidelines note that the “public interest” derogation to the general prohibition on such transfers may be relied on, as well as explicit consent. The Guidelines add, however, that these derogations should only be relied on as a temporary measure and not for repetitive transfers.

The Guidelines highlight the importance of complying with the GDPR’s data protection principles, particularly with respect to transparency. Ideally, notice of processing as part of a research project should be provided to the relevant data subject before the project commences, if data has not been collected directly from the individual, in order to allow the individual to exercise their rights under the GDPR. There may be instances where, considering the number of data subjects, the age of the data and the safeguards in place, it would be impossible or require disproportionate effort to provide notice, in which case researchers may be able to rely on the exemptions set out under Article 14 of the GDPR.

The Guidelines also highlight that processing for scientific purposes is generally not considered incompatible with the purposes for which data is originally collected, assuming that the principles of data minimization, integrity, confidentiality and data protection by design and by default are complied with (See Guidelines)”.

Tear down this wall: Microsoft embraces open data


The Economist: “Two decades ago Microsoft was a byword for a technological walled garden. One of its bosses called free open-source programs a “cancer”. That was then. On April 21st the world’s most valuable tech firm joined a fledgling movement to liberate the world’s data. Among other things, the company plans to launch 20 data-sharing groups by 2022 and give away some of its digital information, including data it has aggregated on covid-19.

Microsoft is not alone in its newfound fondness for sharing in the age of the coronavirus. “The world has faced pandemics before, but this time we have a new superpower: the ability to gather and share data for good,” Mark Zuckerberg, the boss of Facebook, a social-media conglomerate, wrote in the Washington Post on April 20th. Despite the EU’s strict privacy rules, some Eurocrats now argue that data-sharing could speed up efforts to fight the coronavirus. 

But the argument for sharing data is much older than the virus. The OECD, a club mostly of rich countries, reckons that if data were more widely exchanged, many countries could enjoy gains worth between 1% and 2.5% of GDP. The estimate is based on heroic assumptions (such as putting a number on business opportunities created for startups). But economists agree that readier access to data is broadly beneficial, because data are “non-rivalrous”: unlike oil, say, they can be used and re-used without being depleted, for instance to power various artificial-intelligence algorithms at once. 

Many governments have recognised the potential. Cities from Berlin to San Francisco have “open data” initiatives. Companies have been cagier, says Stefaan Verhulst, who heads the Governance Lab at New York University, which studies such things. Firms worry about losing intellectual property, imperilling users’ privacy and hitting technical obstacles. Standard data formats (eg, JPEG images) can be shared easily, but much that a Facebook collects with its software would be meaningless to a Microsoft, even after reformatting. Less than half of the 113 “data collaboratives” identified by the lab involve corporations. Those that do, including initiatives by BBVA, a Spanish bank, and GlaxoSmithKline, a British drugmaker, have been small or limited in scope. 

Microsoft’s campaign is the most consequential by far. Besides encouraging more non-commercial sharing, the firm is developing software, licences and (with the Governance Lab and others) governance frameworks that permit firms to trade data or provide access to them without losing control. Optimists believe that the giant’s move could be to data what IBM’s embrace in the late 1990s of the Linux operating system was to open-source software. Linux went on to become a serious challenger to Microsoft’s own Windows and today underpins Google’s Android mobile software and much of cloud-computing…(More)”.

Mapping how data can help address COVID-19


Blog by Andrew J. Zahuranec and Stefaan G. Verhulst: “The novel coronavirus disease (COVID-19) is a global health crisis the likes of which the modern world has never seen. Amid calls to action from the United Nations Secretary-General, the World Health Organization, and many national governments, there has been a proliferation of initiatives using data to address some facet of the pandemic. In March, The GovLab at NYU put out its own call to action, which identifies key steps organizations and decision-makers can take to build the data infrastructure needed to tackle pandemics. This call has been signed by over 400 data leaders from around the world in the public and private sector and in civil society.

But questions remain as to how many of these initiatives are useful for decision-makers. While The GovLab’s living repository contains over 160 data collaboratives, data competitions, and other innovative work, many of these examples take a data supply-side approach to the COVID-19 response. Given the urgency of the situation, some organizations create projects that align with the available data instead of trying to understand what insights those responding to the crisis actually want, including issues that may not be directly related to public health.

We need to identify and ask better questions to use data effectively in the current crisis. Part of that work means understanding what topics can be addressed through enhanced data access and analysis.

Using The GovLab’s rapid-research methodology, we’ve compiled a list of 12 topic areas related to COVID-19 where data and analysis is needed. …(More)”.

Mobile applications to support contact tracing in the EU’s fight against COVID-19


Common EU Toolbox for Member States by eHealth Network: “Mobile apps have potential to bolster contact tracing strategies to contain and reverse the spread of COVID-19. EU Member States are converging towards effective app solutions that minimise the processing of personal data, and recognise that interoperability between these apps can support public health authorities and support the reopening of the EU’s internal borders.

This first iteration of a common EU toolbox, developed urgently and collaboratively by the e-Health Network with the support of the European Commission, provides a practical guide for Member States. The common approach aims to exploit the latest privacy-enhancing technological solutions that enable at-risk individuals to be contacted and, if necessarily, to be tested as quickly as possible, regardless of where she is and the app she is using. It explains the essential requirements for national apps, namely that they be:

  • voluntary;
  • approved by the national health authority;
  • privacy-preserving – personal data is securely encrypted; and
  • dismantled as soon as no longer needed.

The added value of these apps is that they can record contacts that a person may not notice or remember. These requirements on how to record contacts and notify individuals are anchored in accepted epidemiological guidance, and reflect best practice on cybersecurity, and accessibility. They cover how to prevent the appearance of potentially harmful unapproved apps, success criteria and collectively monitoring the effectiveness of the apps, and the outline of a communications strategy to engage with stakeholders and the people affected by these initiatives.

Work will continue urgently to develop further and implement the toolbox, as set out in the Commission Recommendation of 8 April, including addressing other types of apps and the use of mobility data for modelling to understand the spread of the disease and exit from the crisis….(More)”.

The Atlas of Inequality and Cuebiq’s Data for Good Initiative


Data Collaborative Case Study by Michelle Winowatan, Andrew Young, and Stefaan Verhulst: “The Atlas of Inequality is a research initiative led by scientists at the MIT Media Lab and Universidad Carlos III de Madrid. It is a project within the larger Human Dynamics research initiative at the MIT Media Lab, which investigates how computational social science can improve society, government, and companies. Using multiple big data sources, MIT Media Lab researchers seek to understand how people move in urban spaces and how that movement influences or is influenced by income. Among the datasets used in this initiative was location data provided by Cuebiq, through its Data for Good initiative. Cuebiq offers location-intelligence services to approved research and nonprofit organizations seeking to address public problems. To date, the Atlas has published maps of inequality in eleven cities in the United States. Through the Atlas, the researchers hope to raise public awareness about segregation of social mobility in United States cities resulting from economic inequality and support evidence-based policymaking to address the issue.

Data Collaborative Model: Based on the typology of data collaborative practice areas developed by The GovLab, the use of Cuebiq’s location data by MIT Media Lab researchers for the Atlas of Inequality initiative is an example of the research and analysis partnership model of data collaboration, specifically a data transfer approach. In this approach, companies provide data to partners for analysis, sometimes under the banner of “data philanthropy.” Access to data remains highly restrictive, with only specific partners able to analyze the assets provided. Approved uses are also determined in a somewhat cooperative manner, often with some agreement outlining how and why parties requesting access to data will put it to use….(More)”.

A Data Ecosystem to Defeat COVID-19


Paper by Bapon Fakhruddin: “…A wide range of approaches could be applied to understand transmission, outbreak assessment, risk communication, cascading impacts assessment on essential and other services. The network-based modelling of System of Systems (SOS), mobile technology, frequentist statistics and maximum-likelihood estimation, interactive data visualization, geostatistics, graph theory, Bayesian statistics, mathematical modelling, evidence synthesis approaches and complex thinking frameworks for systems interactions on COVID-19 impacts could be utilized. An example of tools and technologies that could be utilized to act decisively and early to prevent the further spread or quickly suppress the transmission of COVID-19, strengthen the resilience of health systems and save lives and urgent support to developing countries with businesses and corporations are shown in Figure 2. There are also WHO guidance on ‘Health Emergency and Disaster Risk Management[8]’, UNDRR supported ‘Public Health Scorecard Addendum[9]’, and other guidelines (e.g. WHO practical considerations and recommendations for religious leaders and faith-based communities in the context of COVID-19[10]) that could enhance pandemic response plan. It needs to be ensured that any such use is proportionate, specific and protected and does not increase civil liberties’ risk. It is essential therefore to examine in detail the challenge of maximising data use in emergency situations, while ensuring it is task-limited, proportionate and respectful of necessary protections and limitations. This is a complex task and the COVID-19 wil provide us with important test cases. It is also important that data is interpreted accurately. Otherwise, misinterpretations could lead each sector down to incorrect paths.

Figure 2: Tools to strengthen resilience for COVID-19

Many countries are still learning how to make use of data for their decision making in this critical time. The COVID-19 pandemic will provide important lessons on the need for cross-domain research and on how, in such emergencies, to balance the use of technological opportunities and data to counter pandemics against fundamental protections….(More)”.

How Facebook and Google are helping the CDC forecast coronavirus


Karen Hao at MIT Technology Review: “When it comes to predicting the spread of an infectious disease, it’s crucial to understand what Ryan Tibshirani, an associate professor at Carnegie Mellon University, calls the “the pyramid of severity.” The bottom of the pyramid is asymptomatic carriers (those who have the infection but feel fine); the next level is symptomatic carriers (those who are feeling ill); then come hospitalizations, critical hospitalizations, and finally deaths.

Every level of the pyramid has a clear relationship to the next: “For example, sadly, it’s pretty predictable how many people will die once you know how many people are under critical care,” says Tibshirani, who is part of CMU’s Delphi research group, one of the best flu-forecasting teams in the US. The goal, therefore, is to have a clear measure of the lower levels of the pyramid, as the foundation for forecasting the higher ones.

But in the US, building such a model is a Herculean task. A lack of testing makes it impossible to assess the number of asymptomatic carriers. The results also don’t accurately reflect how many symptomatic carriers there are. Different counties have different testing requirements—some choosing only to test patients who require hospitalization. Test results also often take upwards of a week to return.

The remaining option is to measure symptomatic carriers through a large-scale, self-reported survey. But such an initiative won’t work unless it covers a big enough cross section of the entire population. Now the Delphi group, which has been working with the Centers for Disease Control and Prevention to help it coordinate the national pandemic response, has turned to the largest platforms in the US: Facebook and Google.

Facebook will help CMU Delphi research group gather data about Covid symptoms

In a new partnership with Delphi, both tech giants have agreed to help gather data from those who voluntarily choose to report whether they’re experiencing covid-like symptoms. Facebook will target a fraction of their US users with a CMU-run survey, while Google has thus far been using its Opinion Rewards app, which lets users respond to questions for app store credit. The hope is this new information will allow the lab to produce county-by-county projections that will help policymakers allocate resources more effectively.

Neither company will ever actually see the survey results; they’re merely pointing users to the questions administered and processed by the lab. The lab will also never share any of the raw data back to either company. Still, the agreements represent a major deviation from typical data-sharing practices, which could raise privacy concerns. “If this wasn’t a pandemic, I don’t know that companies would want to take the risk of being associated with or asking directly for such a personal piece of information as health,” Tibshirani says.

Without such cooperation, the researchers would’ve been hard pressed to find the data anywhere else. Several other apps allow users to self-report symptoms, including a popular one in the UK known as the Covid Symptom Tracker that has been downloaded over 1.5 million times. But none of them offer the same systematic and expansive coverage as a Facebook or Google-administered survey, says Tibshirani. He hopes the project will collect millions of responses each week….(More)”.

Tracking coronavirus: big data and the challenge to privacy


Nic Fildes and Javier Espinoza at the Financial Times: “When the World Health Organization launched a 2007 initiative to eliminate malaria on Zanzibar, it turned to an unusual source to track the spread of the disease between the island and mainland Africa: mobile phones sold by Tanzania’s telecoms groups including Vodafone, the UK mobile operator.

Working together with researchers at Southampton university, Vodafone began compiling sets of location data from mobile phones in the areas where cases of the disease had been recorded. 

Mapping how populations move between locations has proved invaluable in tracking and responding to epidemics. The Zanzibar project has been replicated by academics across the continent to monitor other deadly diseases, including Ebola in west Africa….

With much of Europe at a standstill as a result of the coronavirus pandemic, politicians want the telecoms operators to provide similar data from smartphones. Thierry Breton, the former chief executive of France Telecom who is now the European commissioner for the internal market, has called on operators to hand over aggregated location data to track how the virus is spreading and to identify spots where help is most needed.

Both politicians and the industry insist that the data sets will be “anonymised”, meaning that customers’ individual identities will be scrubbed out. Mr Breton told the Financial Times: “In no way are we going to track individuals. That’s absolutely not the case. We are talking about fully anonymised, aggregated data to anticipate the development of the pandemic.”

But the use of such data to track the virus has triggered fears of growing surveillance, including questions about how the data might be used once the crisis is over and whether such data sets are ever truly anonymous….(More)”.

New Tool to Establish Responsible Data Collaboratives in the Time of COVID-19


Announcement: “To address the COVID-19 pandemic and other dynamic threats, The GovLab has called for the development of a new data infrastructure and ecosystem. Establishing data collaboratives in a responsible manner often necessitates the creation of data sharing agreements and other legal documentation — a strain on time and capacity both for data holders and those who could use data in the public interest.

Today, to support the development of data collaboratives in a responsible and agile way, we are sharing a new tool that addresses the complexity in preparing a Data Sharing Agreement from Contracts for Data Collaboration (a joint initiative of SDSN-TReNDS, the World Economic Forum, The GovLab, and the University of Washington’s Information Risk Research Initiative). Providing a checklist to support organizations with reviewing, negotiating and preparing Data Sharing Arrangements, the intent is to strengthen stakeholder trust and help accelerate responsible data sharing arrangements given the urgency of the global pandemic.

(Please note that the check list is a tool for formulating and understanding legal issues, but we are not offering it as legal advice.)

CLICK HERE TO DOWNLOAD THE TOOL (More)”.