Announcing the New Data4COVID19 Repository


Blog by Andrew Zahuranec: “It’s been a long year. Back in March, The GovLab released a Call for Action to build the data infrastructure and ecosystem we need to tackle pandemics and other dynamic societal and environmental threats. As part of that work, we launched a Data4COVID19 repository to monitor progress and curate projects that reused data to address the pandemic. At the time, it was hard to say how long it would remain relevant. We did not know how long the pandemic would last nor how many organizations would publish dashboards, visualizations, mobile apps, user tools, and other resources directed at the crisis’s worst consequences.

Seven months later, the COVID-19 pandemic is still with us. Over one million people around the world are dead and many countries face ever-worsening social and economic costs. Though the frequency with which data reuse projects are announced has slowed since the crisis’s early days, they have not stopped. For months, The GovLab has posted dozens of additions to an increasingly unwieldy GoogleDoc.

Today, we are making a change. Given the pandemic’s continued urgency and relevance into 2021 and beyond, The GovLab is pleased to release the new Data4COVID19 Living Repository. The upgraded platform allows people to more easily find and understand projects related to the COVID-19 pandemic and data reuse.

Image for post
The Data4COVID19 Repository

On the platform, visitors will notice a few improvements that distinguish the repository from its earlier iteration. In addition to a main page with short descriptions of each example, we’ve added improved search and filtering functionality. Visitors can sort through any of the projects by:

  • Scope: the size of the target community;
  • Region: the geographic area in which the project takes place;
  • Topic: the aspect of the crisis the project seeks to address; and
  • Pandemic Phase: the stage of pandemic response the project aims to address….(More)”.

The ambitious effort to piece together America’s fragmented health data


Nicole Wetsman at The Verge: “From the early days of the COVID-19 pandemic, epidemiologist Melissa Haendel knew that the United States was going to have a data problem. There didn’t seem to be a national strategy to control the virus, and cases were springing up in sporadic hotspots around the country. With such a patchwork response, nationwide information about the people who got sick would probably be hard to come by.

Other researchers around the country were pinpointing similar problems. In Seattle, Adam Wilcox, the chief analytics officer at UW Medicine, was reaching out to colleagues. The city was the first US COVID-19 hotspot. “We had 10 times the data, in terms of just raw testing, than other areas,” he says. He wanted to share that data with other hospitals, so they would have that information on hand before COVID-19 cases started to climb in their area. Everyone wanted to get as much data as possible in the hands of as many people as possible, so they could start to understand the virus.

Haendel was in a good position to help make that happen. She’s the chair of the National Center for Data to Health (CD2H), a National Institutes of Health program that works to improve collaboration and data sharing within the medical research community. So one week in March, just after she’d started working from home and pulled her 10th grader out of school, she started trying to figure out how to use existing data-sharing projects to help fight this new disease.

The solution Haendel and CD2H landed on sounds simple: a centralized, anonymous database of health records from people who tested positive for COVID-19. Researchers could use the data to figure out why some people get very sick and others don’t, how conditions like cancer and asthma interact with the disease, and which treatments end up being effective.

But in the United States, building that type of resource isn’t easy. “The US healthcare system is very fragmented,” Haendel says. “And because we have no centralized healthcare, that makes it also the case that we have no centralized healthcare data.” Hospitals, citing privacy concerns, don’t like to give out their patients’ health data. Even if hospitals agree to share, they all use different ways of storing information. At one institution, the classification “female” could go into a record as one, and “male” could go in as two — and at the next, they’d be reversed….(More)”.

Common Pitfalls in the Interpretation of COVID-19 Data and Statistics


Paper by Andreas Backhaus: “…In the public debate, one can encounter at least three concepts that measure the deadliness of SARS-CoV-2: the case fatality rate (CFR), the infection fatality rate (IFR) and the mortality rate (MR). Unfortunately, these three concepts are sometimes used interchangeably, which creates confusion as they differ from each other by definition.

In its simplest form, the case fatality rate divides the total number of confirmed deaths by COVID-19 by the total number of confirmed cases of infections with SARS-CoV-2, neglecting adjustments for future deaths among current cases here. However, the number of confirmed cases is believed to severely underestimate the true number of infections. This is due to the asymptomatic process of the infection in many individuals and the lack of testing capacities. Hence, the CFR presumably reflects rather an upper bound to the true lethality of SARS-CoV-2, as its denominator does not take the undetected infections into account.

The infection fatality rate seeks to represent the lethality more accurately by incorporating the number of undetected infections or at least an estimate thereof into its calculation. Consequently, the IFR divides the total number of confirmed deaths by COVID-19 by the total number of infections with SARS-CoV-2. Due to its larger denominator but identical numerator, the IFR is lower than the CFR. The IFR represents a crucial parameter in epidemiological simulation models, such as that presented by Ferguson et al. (2020), as it determines the number of expected fatalities given the simulated spread of the disease among the population.

The methodological challenge regarding the IFR is, of course, to find a credible estimate of the undetected cases of infection. An early estimate of the IFR was provided on the basis of data collected in the course of the SARS-CoV-2 outbreak on the Diamond Princess cruise ship in February 2020. Mizumoto et al. (2020) estimate that 17.9% (95% confidence interval: 15.5-20.2) of the cases were asymptomatic. Russell et al. (2020), after adjusting for age, estimate that the IFR among the Diamond Princess cases is 1.3% (95% confidence interval: 0.38-3.6) when considering all cases, but 6.4% (95% confidence interval: 2.6–13) when considering only cases of patients that are 70 years and older. The serological studies that are currently being conducted in several countries and localities serve to provide more estimates of the true number of infections with SARS-CoV-2 that have occurred over the past few months….(More)”.

Inclusive Policymaking Tools: A COVID-19 Pandemic Case Study


Paper by Ans Irfan, Ankita Arora, Christopher Jackson and Celina
Valencia: “World Health Organization (WHO) estimates indicate the United States of America has the highest novel Coronavirus disease (COVID-19) burden in the world, with over 5 million confirmed cases and nearly 165,000 associated deaths as of August 14th, 2020 (WHO 2020). As the COVID-19 mortality and morbidity has disproportionately impacted populations who experience vulnerabilities due to structural issues such as racism (Laurencin and McClinton 2020; Lin II and Money 2020; Martin 2020; Kim et al. 2020), it has become increasingly necessary to take this opportunity and intentionally codify diversity, equity, and inclusion (DEI) practices in the policymaking process. To encourage and facilitate this, we synthesize existing literature to identify best practices that can not only be used to inform COVID-19-related public policy activities but will also continue to inform inclusive policymaking processes in the future. We identify specific tools for policymakers at all levels of government to better operationalize the DEI framework and enact inclusive, equitable public policies as a result….(More)”.

Digital Government Initiative in response to the COVID-19 Pandemic


Compendium, prepared by the Division for Public Institutions and Digital Government (DPIDG) of the United Nations Department of Economic and Social Affairs (UN DESA): “…aims to capture emerging trends in digital responses of the United Nations Member States against the COVID-19 pandemic, and provide a preliminary analysis of their main features….


The initiatives listed in this compendium were submitted by Member States in response to a call for inputs launched by UN DESA/DPIDG in April/May 2020. The compendium lists selected initiatives according to major categories of action areas. While this publication does not list all initiatives submitted by Member States, the complete list can be accessed here: https://bit.ly/EGOV_COVID19_APPS .

Major groupings of action areas are:

  1. Information sharing
  2. E-participation
  3. E-health
  4. E-business
  5. Contact tracing
  6. Social distancing and virus tracking
  7. Working and learning from home
  8. Digital policy
  9. Partnerships…(More)”.

How Not to Kill People With Spreadsheets


David Gerard at Foreign Policy: “The U.K.’s response to COVID-19 is widely regarded as scattershot and haphazard. So how did they get here?

Excel is a top-of-the-line spreadsheet tool. A spreadsheet is good for quickly modeling a problem—but too often, organizations cut corners and press the cardboard-and-string mock-up into production, instead of building a robust and unique system based on the Excel proof of concept.

Excel is almost universally misused for complex data processing, as in this case—because it’s already present on your work computer and you don’t have to spend months procuring new software. So almost every business has at least one critical process that relies on a years-old spreadsheet set up by past staff members that nobody left at the company understands.

That’s how the U.K. went wrong. An automated process at Public Health England (PHE) transformed the incoming private laboratory test data (which was in text-based CSV files) into Excel-format files, to pass to the Serco Test and Trace teams’ dashboards.

Unfortunately, the process produced XLS files—an outdated Excel format that went extinct in 2003—which had a limit of 65,536 rows, rather than the around 1 million-row limit in the more recent XLSX format. With several lines of data per patient, this meant a sheet could only hold 1,400 cases. Further cases just fell off the end.

Technicians at PHE monitoring the dashboards noticed on Oct. 2 that not all data that had been sent in was making it out the other end. The data was corrected the next day, and PHE announced the issue the day after.

It’s not clear if the software at PHE was an Excel spreadsheet or an in-house program using the XLS format for data interchange—the latter would explain why PHE stated that replacing it might take months—but the XLS format would have been used on the assumption that Excel was universal.

And even then, a system based on Excel-format files would have been an improvement over earlier systems—the system for keeping a count of COVID-19 cases in the U.K. was, as of May, still based on data handwritten on cards….

The process that went wrong was a workaround for a contract issue: The government’s contract with Deloitte to run the testing explicitly stipulated that the company did not have to report “Pillar 2” (general public testing) positive cases to PHE at all.

Since a test-and-trace system is not possible without this data, PHE set up feeds for the data anyway, as CSV text files directly from the testing labs. The data was then put into this system—the single system that serves as the bridge between testing and tracing, for all of England. PHE had to put in place technological duct tape to make a system of life-or-death importance work at all….

The Brookings Institution report Doomed: Challenges and solutions to government IT projects lists factors to consider when outsourcing government information technology. The outsourcing of tracking and tracing is an example where the government has assumed all of the risk, and the contractor assumes all of the profit. PHE did one thing that you should never do: It outsourced a core function. Running a call center or the office canteen? You can outsource it. Tracing a pandemic? You must run it in-house.

If you need outside expertise for a core function, use contractors working within a department. Competing with the private sector on pay can be an issue, but a meaningful project can be a powerful incentive….(More)”.

Dispatches from the Behavioral Scientists Fighting Coronavirus in the Global South


Introduction by Neela Saldanha & Sakshi Ghai: “We are in the middle of a global pandemic, one that has infected more than 35 million people worldwide and killed over 1 million. Almost nine months after the World Health Organization declared the novel coronavirus a “public health emergency of international concern,” the primary strategies we have to prevent the spread of an invisible and often deadly virus are behavioral—keeping a distance, wearing masks, washing hands. No wonder behavioral science has been thrust into the spotlight. Behavioral scientists have been advising national and local governments, as well as health institutions around the world about the best ways to help people collectively adhere to new behaviors.

Although the pandemic rages globally, 7 of the 10 worst outbreaks in the world are in countries in the Global South. These countries have very different social, cultural, and economic contexts from those in the Global North. Mitigating the pandemic in these countries is not simply a matter of importing recommendations from the north. As Saugato Dutta pointed out, “advice that can seem grounded in universal human tendencies must be careful not to ignore the context in which it is applied.”

What are the elements of context that we need to attend to? What issues are behavioral scientists in Nairobi or New Delhi grappling with as they tackle the virus? What can we learn from the interventions deployed in Brazil or in the Philippines? And how can these lessons inspire the rest of the world?

We thought the best way to understand these questions was simply to ask behavioral scientists in those countries. And so, in this special collection, we have curated dispatches from behavioral scientists in Africa, Asia, the Middle East, and South America to learn what’s different about tackling coronavirus.

Our goal is to learn from the work they have done, understand the unique challenges they face, and get their view on what behavioral science needs to focus on to benefit the 80 percent of the world population that lives in these countries. We also hope that this collection will spark ideas and seed collaborations among behavioral scientists in the Global South and North alike. The current situation demands it….(More)”.

Social license for the use of big data in the COVID-19 era


Commentary by James A. Shaw, Nayha Sethi & Christine K. Cassel: “… Social license refers to the informal permissions granted to institutions such as governments or corporations by members of the public to carry out a particular set of activities. Much of the literature on the topic of social license has arisen in the field of natural resources management, emphasizing issues that include but go beyond environmental stewardship4. In their seminal work on social license in the pulp and paper industry, Gunningham et al. defined social license as the “demands and expectations” placed on organizations by members of civil society which “may be tougher than those imposed by regulation”; these expectations thereby demand actions that go beyond existing legal rules to demonstrate concern for the interests of publics. We use the plural term “publics” as opposed to the singular “public” to illustrate that stakeholder groups to which organizations must appeal are often diverse and varied in their assessments of whether a given organizational activity is acceptable6. Despite the potentially fragmented views of various publics, the concept of social license is considered in a holistic way (either an organization has it or does not). Social license is closely related to public trust, and where publics view a particular institution as trustworthy it is more likely to have social license to engage in activities such as the collection and use of personal data7.

The question of how the leaders of an organization might better understand whether they have social license for a particular set of activities has also been addressed in the literature. In a review of literature on social license, Moffat et al. highlighted disagreement in the research community about whether social license can be accurately measured4. Certain groups of researchers emphasize that because of the intangible nature of social license, accurate measurement will never truly be possible. Others propose conceptual models of the determinants of social license, and establish surveys that assess those determinants to indicate the presence or absence of social license in a given context. However, accurate measurement of social license remains a point of debate….(More)”.

How to fix the GDPR’s frustration of global biomedical research


Jasper Bovenberg, David Peloquin, Barbara Bierer, Mark Barnes, and Bartha Maria Knoppers at Science: “Since the advent of the European Union (EU) General Data Protection Regulation (GDPR) in 2018, the biomedical research community has struggled to share data with colleagues and consortia outside the EU, as the GDPR limits international transfers of personal data. A July 2020 ruling of the Court of Justice of the European Union (CJEU) reinforced obstacles to sharing, and even data transfer to enable essential research into coronavirus disease 2019 (COVID-19) has been restricted in a recent Guidance of the European Data Protection Board (EDPB). We acknowledge the valid concerns that gave rise to the GDPR, but we are concerned that the GDPR’s limitations on data transfers will hamper science globally in general and biomedical science in particular (see the text box) (1)—even though one stated objective of the GDPR is that processing of personal data should serve humankind, and even though the GDPR explicitly acknowledges that the right to the protection of personal data is not absolute and must be considered in relation to its function in society and be balanced against other fundamental rights. We examine whether there is room under the GDPR for EU biomedical researchers to share data from the EU with the rest of the world to facilitate biomedical research. We then propose solutions for consideration by either the EU legislature, the EU Commission, or the EDPB in its planned Guidance on the processing of health data for scientific research. Finally, we urge the EDPB to revisit its recent Guidance on COVID-19 research….(More)“.

Why Modeling the Spread of COVID-19 Is So Damn Hard



Matthew Hutson at IEEE Spectrum: “…Researchers say they’ve learned a lot of lessons modeling this pandemic, lessons that will carry over to the next.

The first set of lessons is all about data. Garbage in, garbage out, they say. Jarad Niemi, an associate professor of statistics at Iowa State University who helps run the forecast hub used by the CDC, says it’s not clear what we should be predicting. Infections, deaths, and hospitalization numbers each have problems, which affect their usefulness not only as inputs for the model but also as outputs. It’s hard to know the true number of infections when not everyone is tested. Deaths are easier to count, but they lag weeks behind infections. Hospitalization numbers have immense practical importance for planning, but not all hospitals release those figures. How useful is it to predict those numbers if you never have the true numbers for comparison? What we need, he said, is systematized random testing of the population, to provide clear statistics of both the number of people currently infected and the number of people who have antibodies against the virus, indicating recovery. Prakash, of Georgia Tech, says governments should collect and release data quickly in centralized locations. He also advocates for central repositories of policy decisions, so modelers can quickly see which areas are implementing which distancing measures.

Researchers also talked about the need for a diversity of models. At the most basic level, averaging an ensemble of forecasts improves reliability. More important, each type of model has its own uses—and pitfalls. An SEIR model is a relatively simple tool for making long-term forecasts, but the devil is in the details of its parameters: How do you set those to match real-world conditions now and into the future? Get them wrong and the model can head off into fantasyland. Data-driven models can make accurate short-term forecasts, and machine learning may be good for predicting complicated factors. But will the inscrutable computations of, for instance, a neural network remain reliable when conditions change? Agent-based models look ideal for simulating possible interventions to guide policy, but they’re a lot of work to build and tricky to calibrate.

Finally, researchers emphasize the need for agility. Niemi of Iowa State says software packages have made it easier to build models quickly, and the code-sharing site GitHub lets people share and compare their models. COVID-19 is giving modelers a chance to try out all their newest tools, says Meyers, of the University of Texas. “The pace of innovation, the pace of development, is unlike ever before,” she says. “There are new statistical methods, new kinds of data, new model structures.”…(More)”.