Mapping how data can help address COVID-19


Blog by Andrew J. Zahuranec and Stefaan G. Verhulst: “The novel coronavirus disease (COVID-19) is a global health crisis the likes of which the modern world has never seen. Amid calls to action from the United Nations Secretary-General, the World Health Organization, and many national governments, there has been a proliferation of initiatives using data to address some facet of the pandemic. In March, The GovLab at NYU put out its own call to action, which identifies key steps organizations and decision-makers can take to build the data infrastructure needed to tackle pandemics. This call has been signed by over 400 data leaders from around the world in the public and private sector and in civil society.

But questions remain as to how many of these initiatives are useful for decision-makers. While The GovLab’s living repository contains over 160 data collaboratives, data competitions, and other innovative work, many of these examples take a data supply-side approach to the COVID-19 response. Given the urgency of the situation, some organizations create projects that align with the available data instead of trying to understand what insights those responding to the crisis actually want, including issues that may not be directly related to public health.

We need to identify and ask better questions to use data effectively in the current crisis. Part of that work means understanding what topics can be addressed through enhanced data access and analysis.

Using The GovLab’s rapid-research methodology, we’ve compiled a list of 12 topic areas related to COVID-19 where data and analysis is needed. …(More)”.

Mobile applications to support contact tracing in the EU’s fight against COVID-19


Common EU Toolbox for Member States by eHealth Network: “Mobile apps have potential to bolster contact tracing strategies to contain and reverse the spread of COVID-19. EU Member States are converging towards effective app solutions that minimise the processing of personal data, and recognise that interoperability between these apps can support public health authorities and support the reopening of the EU’s internal borders.

This first iteration of a common EU toolbox, developed urgently and collaboratively by the e-Health Network with the support of the European Commission, provides a practical guide for Member States. The common approach aims to exploit the latest privacy-enhancing technological solutions that enable at-risk individuals to be contacted and, if necessarily, to be tested as quickly as possible, regardless of where she is and the app she is using. It explains the essential requirements for national apps, namely that they be:

  • voluntary;
  • approved by the national health authority;
  • privacy-preserving – personal data is securely encrypted; and
  • dismantled as soon as no longer needed.

The added value of these apps is that they can record contacts that a person may not notice or remember. These requirements on how to record contacts and notify individuals are anchored in accepted epidemiological guidance, and reflect best practice on cybersecurity, and accessibility. They cover how to prevent the appearance of potentially harmful unapproved apps, success criteria and collectively monitoring the effectiveness of the apps, and the outline of a communications strategy to engage with stakeholders and the people affected by these initiatives.

Work will continue urgently to develop further and implement the toolbox, as set out in the Commission Recommendation of 8 April, including addressing other types of apps and the use of mobility data for modelling to understand the spread of the disease and exit from the crisis….(More)”.

The Atlas of Inequality and Cuebiq’s Data for Good Initiative


Data Collaborative Case Study by Michelle Winowatan, Andrew Young, and Stefaan Verhulst: “The Atlas of Inequality is a research initiative led by scientists at the MIT Media Lab and Universidad Carlos III de Madrid. It is a project within the larger Human Dynamics research initiative at the MIT Media Lab, which investigates how computational social science can improve society, government, and companies. Using multiple big data sources, MIT Media Lab researchers seek to understand how people move in urban spaces and how that movement influences or is influenced by income. Among the datasets used in this initiative was location data provided by Cuebiq, through its Data for Good initiative. Cuebiq offers location-intelligence services to approved research and nonprofit organizations seeking to address public problems. To date, the Atlas has published maps of inequality in eleven cities in the United States. Through the Atlas, the researchers hope to raise public awareness about segregation of social mobility in United States cities resulting from economic inequality and support evidence-based policymaking to address the issue.

Data Collaborative Model: Based on the typology of data collaborative practice areas developed by The GovLab, the use of Cuebiq’s location data by MIT Media Lab researchers for the Atlas of Inequality initiative is an example of the research and analysis partnership model of data collaboration, specifically a data transfer approach. In this approach, companies provide data to partners for analysis, sometimes under the banner of “data philanthropy.” Access to data remains highly restrictive, with only specific partners able to analyze the assets provided. Approved uses are also determined in a somewhat cooperative manner, often with some agreement outlining how and why parties requesting access to data will put it to use….(More)”.

A Data Ecosystem to Defeat COVID-19


Paper by Bapon Fakhruddin: “…A wide range of approaches could be applied to understand transmission, outbreak assessment, risk communication, cascading impacts assessment on essential and other services. The network-based modelling of System of Systems (SOS), mobile technology, frequentist statistics and maximum-likelihood estimation, interactive data visualization, geostatistics, graph theory, Bayesian statistics, mathematical modelling, evidence synthesis approaches and complex thinking frameworks for systems interactions on COVID-19 impacts could be utilized. An example of tools and technologies that could be utilized to act decisively and early to prevent the further spread or quickly suppress the transmission of COVID-19, strengthen the resilience of health systems and save lives and urgent support to developing countries with businesses and corporations are shown in Figure 2. There are also WHO guidance on ‘Health Emergency and Disaster Risk Management[8]’, UNDRR supported ‘Public Health Scorecard Addendum[9]’, and other guidelines (e.g. WHO practical considerations and recommendations for religious leaders and faith-based communities in the context of COVID-19[10]) that could enhance pandemic response plan. It needs to be ensured that any such use is proportionate, specific and protected and does not increase civil liberties’ risk. It is essential therefore to examine in detail the challenge of maximising data use in emergency situations, while ensuring it is task-limited, proportionate and respectful of necessary protections and limitations. This is a complex task and the COVID-19 wil provide us with important test cases. It is also important that data is interpreted accurately. Otherwise, misinterpretations could lead each sector down to incorrect paths.

Figure 2: Tools to strengthen resilience for COVID-19

Many countries are still learning how to make use of data for their decision making in this critical time. The COVID-19 pandemic will provide important lessons on the need for cross-domain research and on how, in such emergencies, to balance the use of technological opportunities and data to counter pandemics against fundamental protections….(More)”.

How Facebook and Google are helping the CDC forecast coronavirus


Karen Hao at MIT Technology Review: “When it comes to predicting the spread of an infectious disease, it’s crucial to understand what Ryan Tibshirani, an associate professor at Carnegie Mellon University, calls the “the pyramid of severity.” The bottom of the pyramid is asymptomatic carriers (those who have the infection but feel fine); the next level is symptomatic carriers (those who are feeling ill); then come hospitalizations, critical hospitalizations, and finally deaths.

Every level of the pyramid has a clear relationship to the next: “For example, sadly, it’s pretty predictable how many people will die once you know how many people are under critical care,” says Tibshirani, who is part of CMU’s Delphi research group, one of the best flu-forecasting teams in the US. The goal, therefore, is to have a clear measure of the lower levels of the pyramid, as the foundation for forecasting the higher ones.

But in the US, building such a model is a Herculean task. A lack of testing makes it impossible to assess the number of asymptomatic carriers. The results also don’t accurately reflect how many symptomatic carriers there are. Different counties have different testing requirements—some choosing only to test patients who require hospitalization. Test results also often take upwards of a week to return.

The remaining option is to measure symptomatic carriers through a large-scale, self-reported survey. But such an initiative won’t work unless it covers a big enough cross section of the entire population. Now the Delphi group, which has been working with the Centers for Disease Control and Prevention to help it coordinate the national pandemic response, has turned to the largest platforms in the US: Facebook and Google.

Facebook will help CMU Delphi research group gather data about Covid symptoms

In a new partnership with Delphi, both tech giants have agreed to help gather data from those who voluntarily choose to report whether they’re experiencing covid-like symptoms. Facebook will target a fraction of their US users with a CMU-run survey, while Google has thus far been using its Opinion Rewards app, which lets users respond to questions for app store credit. The hope is this new information will allow the lab to produce county-by-county projections that will help policymakers allocate resources more effectively.

Neither company will ever actually see the survey results; they’re merely pointing users to the questions administered and processed by the lab. The lab will also never share any of the raw data back to either company. Still, the agreements represent a major deviation from typical data-sharing practices, which could raise privacy concerns. “If this wasn’t a pandemic, I don’t know that companies would want to take the risk of being associated with or asking directly for such a personal piece of information as health,” Tibshirani says.

Without such cooperation, the researchers would’ve been hard pressed to find the data anywhere else. Several other apps allow users to self-report symptoms, including a popular one in the UK known as the Covid Symptom Tracker that has been downloaded over 1.5 million times. But none of them offer the same systematic and expansive coverage as a Facebook or Google-administered survey, says Tibshirani. He hopes the project will collect millions of responses each week….(More)”.

Tracking coronavirus: big data and the challenge to privacy


Nic Fildes and Javier Espinoza at the Financial Times: “When the World Health Organization launched a 2007 initiative to eliminate malaria on Zanzibar, it turned to an unusual source to track the spread of the disease between the island and mainland Africa: mobile phones sold by Tanzania’s telecoms groups including Vodafone, the UK mobile operator.

Working together with researchers at Southampton university, Vodafone began compiling sets of location data from mobile phones in the areas where cases of the disease had been recorded. 

Mapping how populations move between locations has proved invaluable in tracking and responding to epidemics. The Zanzibar project has been replicated by academics across the continent to monitor other deadly diseases, including Ebola in west Africa….

With much of Europe at a standstill as a result of the coronavirus pandemic, politicians want the telecoms operators to provide similar data from smartphones. Thierry Breton, the former chief executive of France Telecom who is now the European commissioner for the internal market, has called on operators to hand over aggregated location data to track how the virus is spreading and to identify spots where help is most needed.

Both politicians and the industry insist that the data sets will be “anonymised”, meaning that customers’ individual identities will be scrubbed out. Mr Breton told the Financial Times: “In no way are we going to track individuals. That’s absolutely not the case. We are talking about fully anonymised, aggregated data to anticipate the development of the pandemic.”

But the use of such data to track the virus has triggered fears of growing surveillance, including questions about how the data might be used once the crisis is over and whether such data sets are ever truly anonymous….(More)”.

New Tool to Establish Responsible Data Collaboratives in the Time of COVID-19


Announcement: “To address the COVID-19 pandemic and other dynamic threats, The GovLab has called for the development of a new data infrastructure and ecosystem. Establishing data collaboratives in a responsible manner often necessitates the creation of data sharing agreements and other legal documentation — a strain on time and capacity both for data holders and those who could use data in the public interest.

Today, to support the development of data collaboratives in a responsible and agile way, we are sharing a new tool that addresses the complexity in preparing a Data Sharing Agreement from Contracts for Data Collaboration (a joint initiative of SDSN-TReNDS, the World Economic Forum, The GovLab, and the University of Washington’s Information Risk Research Initiative). Providing a checklist to support organizations with reviewing, negotiating and preparing Data Sharing Arrangements, the intent is to strengthen stakeholder trust and help accelerate responsible data sharing arrangements given the urgency of the global pandemic.

(Please note that the check list is a tool for formulating and understanding legal issues, but we are not offering it as legal advice.)

CLICK HERE TO DOWNLOAD THE TOOL (More)”.

The Responsible Data for Children (RD4C) Case Studies


Andrew Young at Datastewards.net: “This week, as part of the Responsible Data for Children initiative (RD4C), the GovLab and UNICEF launched a new case study series to provide insights on promising practice as well as barriers to realizing responsible data for children.

Drawing upon field-based research and established good practice, RD4C aims to highlight and support responsible handling of data for and about children; identify challenges and develop practical tools to assist practitioners in evaluating and addressing them; and encourage a broader discussion on actionable principles, insights, and approaches for responsible data management.

RD4C launched in October 2019 with the release of the RD4C Synthesis ReportSelected Readings, and the RD4C Principles: Purpose-Driven, People-Centric, Participatory, Protective of Children’s Rights, Proportional, Professionally Accountable, and Prevention of Harms Across the Data Lifecycle.

The RD4C Case Studies analyze data systems deployed in diverse country environments, with a focus on their alignment with the RD4C Principles. This week’s release includes case studies arising from field missions to Romania, Kenya, and Afghanistan in 2019. The data systems examined are:

Coronavirus: country comparisons are pointless unless we account for these biases in testing


Norman Fenton, Magda Osman, Martin Neil, and Scott McLachlan at The Conversation: “Suppose we wanted to estimate how many car owners there are in the UK and how many of those own a Ford Fiesta, but we only have data on those people who visited Ford car showrooms in the last year. If 10% of the showroom visitors owned a Fiesta, then, because of the bias in the sample, this would certainly overestimate the proportion of Ford Fiesta owners in the country.

Estimating death rates for people with COVID-19 is currently undertaken largely along the same lines. In the UK, for example, almost all testing of COVID-19 is performed on people already hospitalised with COVID-19 symptoms. At the time of writing, there are 29,474 confirmed COVID-19 cases (analogous to car owners visiting a showroom) of whom 2,352 have died (Ford Fiesta owners who visited a showroom). But it misses out all the people with mild or no symptoms.

Concluding that the death rate from COVID-19 is on average 8% (2,352 out of 29,474) ignores the many people with COVID-19 who are not hospitalised and have not died (analogous to car owners who did not visit a Ford showroom and who do not own a Ford Fiesta). It is therefore equivalent to making the mistake of concluding that 10% of all car owners own a Fiesta.

There are many prominent examples of this sort of conclusion. The Oxford COVID-19 Evidence Service have undertaken a thorough statistical analysis. They acknowledge potential selection bias, and add confidence intervals showing how big the error may be for the (potentially highly misleading) proportion of deaths among confirmed COVID-19 patients.

They note various factors that can result in wide national differences – for example the UK’s 8% (mean) “death rate” is very high compared to Germany’s 0.74%. These factors include different demographics, for example the number of elderly in a population, as well as how deaths are reported. For example, in some countries everybody who dies after having been diagnosed with COVID-19 is recorded as a COVID-19 death, even if the disease was not the actual cause, while other people may die from the virus without actually having been diagnosed with COVID-19.

However, the models fail to incorporate explicit causal explanations in their modelling that might enable us to make more meaningful inferences from the available data, including data on virus testing.

What a causal model would look like. Author provided

We have developed an initial prototype “causal model” whose structure is shown in the figure above. The links between the named variables in a model like this show how they are dependent on each other. These links, along with other unknown variables, are captured as probabilities. As data are entered for specific, known variables, all of the unknown variable probabilities are updated using a method called Bayesian inference. The model shows that the COVID-19 death rate is as much a function of sampling methods, testing and reporting, as it is determined by the underlying rate of infection in a vulnerable population….(More)”

The potential of Data Collaboratives for COVID19


Blog post by Stefaan Verhulst: “We live in almost unimaginable times. The spread of COVID-19 is a human tragedy and global crisis that will impact our communities for many years to come. The social and economic costs are huge and mounting, and they are already contributing to a global slowdown. Every day, the emerging pandemic reveals new vulnerabilities in various aspects of our economic, political and social lives. These include our vastly overstretched public health services, our dysfunctional political climate, and our fragile global supply chains and financial markets.

The unfolding crisis is also making shortcomings clear in another area: the way we re-use data responsibly. Although this aspect of the crisis has been less remarked upon than other, more obvious failures, those who work with data—and who have seen its potential to impact the public good—understand that we have failed to create the necessary governance and institutional structures that would allow us to harness data responsibly to halt or at least limit this pandemic. A recent article in Stat, an online journal dedicated to health news, characterized the COVID-19 outbreak as “a once-in-a-century evidence fiasco.” The article continues: 

“At a time when everyone needs better information, […] we lack reliable evidence on how many people have been infected with SARS-CoV-2 or who continue to become infected. Better information is needed to guide decisions and actions of monumental significance and to monitor their impact.” 

It doesn’t have to be this way, and these data challenges are not an excuse for inaction. As we explain in what follows, there is ample evidence that the re-use of data can help mitigate health pandemics. A robust (if somewhat unsystematized) body of knowledge could direct policymakers and others in their efforts. In the second part of this article, we outline eight steps that key stakeholders can and should take to better re-use data in the fight against COVID-19. In particular, we argue that more responsible data stewardship and increased use of data collaboratives are critical….(More)”.