data collaboratives

Removing the pump handle: Stewarding data at times of public health emergency

Curated on May 21, 2020May 21, 2020 by Stefaan Verhulst

Reema Patel at Significance: “There is a saying, incorrectly attributed to Mark Twain, that states: “History never repeat itself but it rhymes”. Seeking to understand the implications of the current crisis for the effective use of data, I’ve drawn on the nineteenth-century cholera outbreak in London’s Soho to identify some “rhyming patterns” that might inform our approaches to data use and governance at this time of public health crisis.

Where better to begin than with the work of Victorian pioneer John Snow? In 1854, Snow’s use of a dot map to illustrate clusters of cholera cases around public water pumps, and of statistics to establish the connection between the quality of water sources and cholera outbreaks, led to a breakthrough in public health interventions – and, famously, the removal of the handle of a water pump in Broad Street.

Data is vital

We owe a lot to Snow, especially now. His examples teaches us that data has a central role to play in saving lives, and that the effective use of (and access to) data is critical for enabling timely responses to public health emergencies.

Take, for instance, transport app CityMapper’s rapid redeployment of its aggregated transport data. In the early days of the Covid-19 pandemic, this formed part of an analysis of compliance with social distancing restrictions across a range of European cities. There is also the US-based health weather map, which uses anonymised and aggregated data to visualise fever, specifically influenza-like illnesses. This data helped model early indications of where, and how quickly, Covid-19 was spreading….

Ethics and human rights still matter

As the current crisis evolves, many have expressed concern that the pandemic will be used to justify the rapid roll out of surveillance technologies that do not meet ethical and human rights standards, and that this will be done in the name of the “public good”. Examples of these technologies include symptom- and contact-tracing applications. Privacy experts are also increasingly concerned that governments will be trading off more personal data than is necessary or proportionate to respond to the public health crisis.

Many ethical and human rights considerations (including those listed at the bottom of this piece) are at risk of being overlooked at this time of emergency, and governments would be wise not to press ahead regardless, ignoring legitimate concerns about rights and standards. Instead, policymakers should begin to address these concerns by asking how we can prepare (now and in future) to establish clear and trusted boundaries for the use of data (personal and non-personal) in such crises.

Democratic states in Europe and the US have not, in recent memory, prioritised infrastructures and systems for a crisis of this scale – and this has contributed to our current predicament. Contrast this with Singapore, which suffered outbreaks of SARS and H1N1, and channelled this experience into implementing pandemic preparedness measures.

We cannot undo the past, but we can begin planning and preparing constructively for the future, and that means strengthening global coordination and finding mechanisms to share learning internationally. Getting the right data infrastructure in place has a central role to play in addressing ethical and human rights concerns around the use of data….(More)”.

The Law and Policy of Government Access to Private Sector Data (‘B2G Data Sharing’)

Curated on May 20, 2020May 20, 2020 by Stefaan Verhulst

Paper by Heiko Richter: “The tremendous rate of technological advancement in recent years has fostered a policy de-bate about improving the state’s access to privately held data (‘B2G data sharing’). Access to such ‘data of general interest’ can significantly improve social welfare and serve the common good. At the same time, expanding the state’s access to privately held data poses risks. This chapter inquires into the potential and limits of mandatory access rules, which would oblige private undertakings to grant access to data for specific purposes that lie in the public interest. The article discusses the key questions that access rules should address and develops general principles for designing and implementing such rules. It puts particular emphasis on the opportunities and limitations for the implementation of horizontal B2G access frameworks. Finally, the chapter outlines concrete recommendations for legislative reforms….(More)”.

Viruses Cross Borders. To Fight Them, Countries Must Let Medical Data Flow, Too

Curated on May 11, 2020May 11, 2020 by Stefaan Verhulst

Nigel Cory at ITIF: “If nations could regulate viruses the way many regulate data, there would be no global pandemics. But the sad reality is that, in the midst of the worst global pandemic in living memory, many nations make it unnecessarily complicated and costly, if not illegal, for health data to cross their borders. In so doing, they are hindering critically needed medical progress.

In the COVID-19 crisis, data analytics powered by artificial intelligence (AI) is critical to identifying the exact nature of the pandemic and developing effective treatments. The technology can produce powerful insights and innovations, but only if researchers can aggregate and analyze data from populations around the globe. And that requires data to move across borders as part of international research efforts by private firms, universities, and other research institutions. Yet, some countries, most notably China, are stopping health and genomic data at their borders.

Indeed, despite the significant benefits to companies, citizens, and economies that arise from the ability to easily share data across borders, dozens of countries—across every stage of development—have erected barriers to cross-border data flows. These data-residency requirements strictly confine data within a country’s borders, a concept known as “data localization,” and many countries have especially strict requirements for health data.

China is a noteworthy offender, having created a new digital iron curtain that requires data localization for a range of data types, including health data, as part of its so-called “cyber sovereignty” strategy. A May 2019 State Council regulation required genomic data to be stored and processed locally by Chinese firms—and foreign organizations are prohibited. This is in service of China’s mercantilist strategy to advance its domestic life sciences industry. While there has been collaboration between U.S. and Chinese medical researchers on COVID-19, including on clinical trials for potential treatments, these restrictions mean that it won’t involve the transfer, aggregation, and analysis of Chinese personal data, which otherwise might help find a treatment or vaccine. If China truly wanted to make amends for blocking critical information during the early stages of the outbreak in Wuhan, then it should abolish this restriction and allow genomic and other health data to cross its borders.

But China is not alone in limiting data flows. Russia requires all personal data, health-related or not, to be stored locally. India’s draft data protection bill permits the government to classify any sensitive personal data as critical personal data and mandate that it be stored and processed only within the country. This would be consistent with recent debates and decisions to require localization for payments data and other types of data. And despite its leading role in pushing for the free flow of data as part of new digital trade agreements, Australia requires genomic and other data attached to personal electronic health records to be only stored and processed within its borders.

Countries also enact de facto barriers to health and genomic data transfers by making it harder and more expensive, if not impractical, for firms to transfer it overseas than to store it locally. For example, South Korea and Turkey require firms to get explicit consent from people to transfer sensitive data like genomic data overseas. Doing this for hundreds or thousands of people adds considerable costs and complexity.

And the European Union’s General Data Protection Regulation encourages data localization as firms feel pressured to store and process personal data within the EU given the restrictions it places on data transfers to many countries. This is in addition to the renewed push for local data storage and processing under the EU’s new data strategy.

Countries rationalize these steps on the basis that health data, particularly genomic data, is sensitive. But requiring health data to be stored locally does little to increase privacy or data security. The confidentiality of data does not depend on which country the information is stored in, only on the measures used to store it securely, such as via encryption, and the policies and procedures the firms follow in storing or analyzing the data. For example, if a nation has limits on the use of genomics data, then domestic organizations using that data face the same restrictions, whether they store the data in the country or outside of it. And if they share the data with other organizations, they must require those organizations, regardless of where they are located, to abide by the home government’s rules.

As such, policymakers need to stop treating health data differently when it comes to cross-border movement, and instead build technical, legal, and ethical protections into both domestic and international data-governance mechanisms, which together allow the responsible sharing and transfer of health and genomic data.

This is clearly possible—and needed. In February 2020, leading health researchers called for an international code of conduct for genomic data following the end of their first-of-its-kind international data-driven research project. The project used a purpose-built cloud service that stored 800 terabytes of genomic data on 2,658 cancer genomes across 13 data centers on three continents. The collaboration and use of cloud computing were transformational in enabling large-scale genomic analysis….(More)”.

A data sharing method in the open web environment: Data sharing in hydrology

Curated on May 4, 2020May 4, 2020 by Stefaan Verhulst

Paper by Jin Wang et al: “Data sharing plays a fundamental role in providing data resources for geographic modeling and simulation. Although there are many successful cases of data sharing through the web, current practices for sharing data mostly focus on data publication using metadata at the file level, which requires identifying, restructuring and synthesizing raw data files for further usage. In hydrology, because the same hydrological information is often stored in data files with different formats, modelers should identify the required information from multisource data sets and then customize data requirements for their applications. However, these data customization tasks are difficult to repeat, which leads to repetitive labor. This paper presents a data sharing method that provides a solution for data manipulation based on a structured data description model rather than raw data files. With the structured data description model, multisource hydrological data can be accessed and processed in a unified way and published as data services using a designed data server. This study also proposes a data configuration manager to customize data requirements through an interactive programming tool, which can help in using the data services. In addition, a component-based data viewer is developed for the visualization of multisource data in a sharable visualization scheme. A case study that involves sharing and applying hydrological data is designed to examine the applicability and feasibility of the proposed data sharing method….(More)”.

Responsible Data Toolkit

Curated on April 30, 2020April 30, 2020 by Stefaan Verhulst

Andrew Young at The GovLab: “The GovLab and UNICEF, as part of the Responsible Data for Children initiative (RD4C), are pleased to share a set of user-friendly tools to support organizations and practitioners seeking to operationalize the RD4C Principles. These principles—Purpose-Driven, People-Centric, Participatory, Protective of Children’s Rights, Proportional, Professionally Accountable, and Prevention of Harms Across the Data Lifecycle—are especially important in the current moment, as actors around the world are taking a data-driven approach to the fight against COVID-19.

The initial components of the RD4C Toolkit are:

The RD4C Data Ecosystem Mapping Tool intends to help users to identify the systems generating data about children and the key components of those systems. After using this tool, users will be positioned to understand the breadth of data they generate and hold about children; assess data systems’ redundancies or gaps; identify opportunities for responsible data use; and achieve other insights.

The RD4C Decision Provenance Mapping methodology provides a way for actors designing or assessing data investments for children to identify key decision points and determine which internal and external parties influence those decision points. This distillation can help users to pinpoint any gaps and develop strategies for improving decision-making processes and advancing more professionally accountable data practices.

The RD4C Opportunity and Risk Diagnostic provides organizations with a way to take stock of the RD4C principles and how they might be realized as an organization reviews a data project or system. The high-level questions and prompts below are intended to help users identify areas in need of attention and to strategize next steps for ensuring more responsible handling of data for and about children across their organization.

Finally, the Data for Children Collaborative with UNICEF developed an Ethical Assessment that “forms part of [their] safe data ecosystem, alongside data management and data protection policies and practices.” The tool reflects the RD4C Principles and aims to “provide an opportunity for project teams to reflect on the material consequences of their actions, and how their work will have real impacts on children’s lives.

RD4C launched in October 2019 with the release of the RD4C Synthesis Report, Selected Readings, and the RD4C Principles. Last month we published the The RD4C Case Studies, which analyze data systems deployed in diverse country environments, with a focus on their alignment with the RD4C Principles. The case studies are: Romania’s The Aurora Project, Childline Kenya, and Afghanistan’s Nutrition Online Database.

To learn more about Responsible Data for Children, visit rd4c.org or contact rd4c [at] thegovlab.org. To join the RD4C conversation and be alerted to future releases, subscribe at this link.”

From Idea to Reality: Why We Need an Open Data Policy Lab

Curated on April 23, 2020April 23, 2020 by Stefaan Verhulst

Stefaan G. Verhulst at Open Data Policy Lab: “The belief that we are living in a data age — one characterized by unprecedented amounts of data, with unprecedented potential — has become mainstream. We regularly read phrases such as “data is the most valuable commodity in the global economy” or that data provides decision-makers with an “ever-swelling flood of information.”

Without a doubt, there is truth in such statements. But they also leave out a major shortcoming — the fact that much of the most useful data continue to remain inaccessible, hidden in silos, behind digital walls, and in untapped “treasuries.”

For close to a decade, the technology and public interest community have pushed the idea of open data. At its core, open data represents a new paradigm of data availability and access. The movement borrows from the language of open source and is rooted in notions of a “knowledge commons”, a concept developed, among others, by scholars like Nobel Prize winner Elinor Ostrom.

Milestones and Limitations in Open Data

Significant milestones have been achieved in the short history of the open data movement. Around the world, an ever-increasing number of governments at the local, state and national levels now release large datasets for the public’s benefit. For example, New York City requires that all public data be published on a single web portal. The current portal site contains thousands of datasets that fuel projects on topics as diverse as school bullying, sanitation, and police conduct. In California, the Forest Practice Watershed Mapper allows users to track the impact of timber harvesting on aquatic life through the use of the state’s open data. Similarly, Denmark’s Building and Dwelling Register releases address data to the public free of charge, improving transparent property assessment for all interested parties.

A growing number of private companies have also initiated or engaged in “Data Collaborative”projects to leverage their private data toward the public interest. For example, Valassis, a direct-mail marketing company, shared its massive address database with community groups in New Orleans to visualize and track block-by-block repopulation rates after Hurricane Katrina. A wide number of data collaboratives are also currently being launched to respond to the COVID-19 pandemic. Through its COVID-19 Data Collaborative Program, the location-intelligence company Cuebiq is providing researchers access to the company’s data to study, for instance, the impacts of social distancing policies in Italy and New York City. The health technology company Kinsa Health’s US Health Weather initiative is likewise visualizing the rate of fever across the United States using data from its network of Smart Thermometers, thereby providing early indications regarding the location of likely COVID-19 outbreaks.

Yet despite such initiatives, many open data projects (and data collaboratives) remain fledgling — especially those at the state and local level.

Among other issues, the field has trouble scaling projects beyond initial pilots, and many potential stakeholders — private sector and government “owners” of data, as well as public beneficiaries — remain skeptical of open data’s value. In addition, terabytes of potentially transformative data remain inaccessible for re-use. It is absolutely imperative that we continue to make the case to all stakeholders regarding the importance of open data, and of moving it from an interesting idea to an impactful reality. In order to do this, we need a new resource — one that can inform the public and data owners, and that would guide decision-makers on how to achieve open data in a responsible manner, without undermining privacy and other rights.

Purpose of the Open Data Policy Lab

Today, with support from Microsoft and under the counsel of a global advisory board of open data leaders, The GovLab is launching an initiative designed precisely to build such a resource.

Our Open Data Policy Lab will draw on lessons and experiences from around the world to conduct analysis, provide guidance, build community, and take action to accelerate the responsible re-use and opening of data for the benefit of society and the equitable spread of economic opportunity…(More)”.

EDPB Adopts Guidelines on the Processing of Health Data During COVID-19

Curated on April 22, 2020April 23, 2020 by Stefaan Verhulst

Hunton Privacy Blog: “On April 21, 2020, the European Data Protection Board (“EDPB”) adopted Guidelines on the processing of health data for scientific purposes in the context of the COVID-19 pandemic. The aim of the Guidelines is to provide clarity on the most urgent matters relating to health data, such as legal basis for processing, the implementation of adequate safeguards and the exercise of data subject rights.

The Guidelines note that the General Data Protection Regulation (“GDPR”) provides a specific derogation to the prohibition on processing of sensitive data under Article 9, for scientific purposes. With respect to the legal basis for processing, the Guidelines state that consent may be relied on under both Article 6 and the derogation to the prohibition on processing under Article 9 in the context of COVID-19, as long as the requirements for explicit consent are met, and as long as there is no power imbalance that could pressure or disadvantage a reluctant data subject. Researchers should keep in mind that study participants must be able to withdraw their consent at any time. National legislation may also provide an appropriate legal basis for the processing of health data and a derogation to the Article 9 prohibition. Furthermore, national laws may restrict data subject rights, though these restrictions should apply only as is strictly necessary.

In the context of transfers to countries outside the European Economic Area that have not been deemed adequate by the European Commission, the Guidelines note that the “public interest” derogation to the general prohibition on such transfers may be relied on, as well as explicit consent. The Guidelines add, however, that these derogations should only be relied on as a temporary measure and not for repetitive transfers.

The Guidelines highlight the importance of complying with the GDPR’s data protection principles, particularly with respect to transparency. Ideally, notice of processing as part of a research project should be provided to the relevant data subject before the project commences, if data has not been collected directly from the individual, in order to allow the individual to exercise their rights under the GDPR. There may be instances where, considering the number of data subjects, the age of the data and the safeguards in place, it would be impossible or require disproportionate effort to provide notice, in which case researchers may be able to rely on the exemptions set out under Article 14 of the GDPR.

The Guidelines also highlight that processing for scientific purposes is generally not considered incompatible with the purposes for which data is originally collected, assuming that the principles of data minimization, integrity, confidentiality and data protection by design and by default are complied with (See Guidelines)”.

Tear down this wall: Microsoft embraces open data

Curated on April 22, 2020April 22, 2020 by Stefaan Verhulst

The Economist: “Two decades ago Microsoft was a byword for a technological walled garden. One of its bosses called free open-source programs a “cancer”. That was then. On April 21st the world’s most valuable tech firm joined a fledgling movement to liberate the world’s data. Among other things, the company plans to launch 20 data-sharing groups by 2022 and give away some of its digital information, including data it has aggregated on covid-19.

Microsoft is not alone in its newfound fondness for sharing in the age of the coronavirus. “The world has faced pandemics before, but this time we have a new superpower: the ability to gather and share data for good,” Mark Zuckerberg, the boss of Facebook, a social-media conglomerate, wrote in the Washington Post on April 20th. Despite the EU’s strict privacy rules, some Eurocrats now argue that data-sharing could speed up efforts to fight the coronavirus.

But the argument for sharing data is much older than the virus. The OECD, a club mostly of rich countries, reckons that if data were more widely exchanged, many countries could enjoy gains worth between 1% and 2.5% of GDP. The estimate is based on heroic assumptions (such as putting a number on business opportunities created for startups). But economists agree that readier access to data is broadly beneficial, because data are “non-rivalrous”: unlike oil, say, they can be used and re-used without being depleted, for instance to power various artificial-intelligence algorithms at once.

Many governments have recognised the potential. Cities from Berlin to San Francisco have “open data” initiatives. Companies have been cagier, says Stefaan Verhulst, who heads the Governance Lab at New York University, which studies such things. Firms worry about losing intellectual property, imperilling users’ privacy and hitting technical obstacles. Standard data formats (eg, JPEG images) can be shared easily, but much that a Facebook collects with its software would be meaningless to a Microsoft, even after reformatting. Less than half of the 113 “data collaboratives” identified by the lab involve corporations. Those that do, including initiatives by BBVA, a Spanish bank, and GlaxoSmithKline, a British drugmaker, have been small or limited in scope.

Microsoft’s campaign is the most consequential by far. Besides encouraging more non-commercial sharing, the firm is developing software, licences and (with the Governance Lab and others) governance frameworks that permit firms to trade data or provide access to them without losing control. Optimists believe that the giant’s move could be to data what IBM’s embrace in the late 1990s of the Linux operating system was to open-source software. Linux went on to become a serious challenger to Microsoft’s own Windows and today underpins Google’s Android mobile software and much of cloud-computing…(More)”.

Mapping how data can help address COVID-19

Curated on April 16, 2020April 16, 2020 by Stefaan Verhulst

Blog by Andrew J. Zahuranec and Stefaan G. Verhulst: “The novel coronavirus disease (COVID-19) is a global health crisis the likes of which the modern world has never seen. Amid calls to action from the United Nations Secretary-General, the World Health Organization, and many national governments, there has been a proliferation of initiatives using data to address some facet of the pandemic. In March, The GovLab at NYU put out its own call to action, which identifies key steps organizations and decision-makers can take to build the data infrastructure needed to tackle pandemics. This call has been signed by over 400 data leaders from around the world in the public and private sector and in civil society.

But questions remain as to how many of these initiatives are useful for decision-makers. While The GovLab’s living repository contains over 160 data collaboratives, data competitions, and other innovative work, many of these examples take a data supply-side approach to the COVID-19 response. Given the urgency of the situation, some organizations create projects that align with the available data instead of trying to understand what insights those responding to the crisis actually want, including issues that may not be directly related to public health.

We need to identify and ask better questions to use data effectively in the current crisis. Part of that work means understanding what topics can be addressed through enhanced data access and analysis.

Using The GovLab’s rapid-research methodology, we’ve compiled a list of 12 topic areas related to COVID-19 where data and analysis is needed. …(More)”.

Mobile applications to support contact tracing in the EU’s fight against COVID-19

Curated on April 16, 2020April 16, 2020 by Stefaan Verhulst

Common EU Toolbox for Member States by eHealth Network: “Mobile apps have potential to bolster contact tracing strategies to contain and reverse the spread of COVID-19. EU Member States are converging towards effective app solutions that minimise the processing of personal data, and recognise that interoperability between these apps can support public health authorities and support the reopening of the EU’s internal borders.

This first iteration of a common EU toolbox, developed urgently and collaboratively by the e-Health Network with the support of the European Commission, provides a practical guide for Member States. The common approach aims to exploit the latest privacy-enhancing technological solutions that enable at-risk individuals to be contacted and, if necessarily, to be tested as quickly as possible, regardless of where she is and the app she is using. It explains the essential requirements for national apps, namely that they be:

voluntary;
approved by the national health authority;
privacy-preserving – personal data is securely encrypted; and
dismantled as soon as no longer needed.

The added value of these apps is that they can record contacts that a person may not notice or remember. These requirements on how to record contacts and notify individuals are anchored in accepted epidemiological guidance, and reflect best practice on cybersecurity, and accessibility. They cover how to prevent the appearance of potentially harmful unapproved apps, success criteria and collectively monitoring the effectiveness of the apps, and the outline of a communications strategy to engage with stakeholders and the people affected by these initiatives.

Work will continue urgently to develop further and implement the toolbox, as set out in the Commission Recommendation of 8 April, including addressing other types of apps and the use of mobility data for modelling to understand the spread of the disease and exit from the crisis….(More)”.