How data science can ease the COVID-19 pandemic

Nigam Shah and Jacob Steinhardt at Brookings: “Social distancing and stay-at-home orders in the United States have slowed the infection rate of SARS-CoV-2, the pathogen that causes COVID-19. This has halted the immediate threat to the U.S. healthcare system, but consensus on a long-term plan or solution to the crisis remains unclear.  As the reality settles in that there are no quick fixes and that therapies and vaccines will take several months if not years to inventvalidate, and mass produce, this is a good time to consider another question: How can data science and technology help us endure the pandemic while we develop therapies and vaccines?

Before policymakers reopen their economies, they must be sure that the resulting new COVID-19 cases will not force local healthcare systems to resort to crisis standards of care. Doing so requires not just prevention and suppression of the virus, but ongoing measurement of virus activity, assessment of the efficacy of suppression measures, and forecasting of near-term demand on local health systems. This demand is highly variable given community demographics, the prevalence of pre-existing conditions, and population density and socioeconomics.

Data science can already provide ongoing, accurate estimates of health system demand, which is a requirement in almost all reopening plans. We need to go beyond that to a dynamic approach of data collection, analysis, and forecasting to inform policy decisions in real time and iteratively optimize public health recommendations for re-opening. While most reopening plans propose extensive testingcontact tracing, and monitoring of population mobility, almost none consider setting up such a dynamic feedback loop. Having such feedback could determine what level of virus activity can be tolerated in an area, given regional health system capacity, and adjust population distancing accordingly.

We propose that by using existing technology and some nifty data science, it is possible to set up that feedback loop, which would maintain healthcare demand under the threshold of what is available in a region. Just as the maker community stepped up to cover for the failures of the government to provide adequate protective gear to health workers, this is an opportunity for the data and tech community to partner with healthcare experts and provide a measure of public health planning that governments are unable to do. Therefore, the question we invite the data science community to focus on is: How can data science help forecast regional health system resource needs given measurements of virus activity and suppression measures such as population distancing?…

Concretely, then, the crucial “data science” task is to learn the counterfactual function linking last week’s population mobility and today’s transmission rates to project hospital demand two weeks later. Imagine taking past measurements of mobility around April 10 in a region (such as the Santa Clara County’s report from COVID-19 Community Mobility Reports), the April 20 virus transmission rate estimate for the region (such as from, and the April 25 burden on the health system (such as from the Santa Clara County Hospitalization dashboard), to learn a function that uses today’s mobility and transmission rates to anticipate needed hospital resources two weeks later. It is unclear how many days of data of each proxy measurement we need to reliably learn such a function, what mathematical form this function might take, and how we do this correctly with the observational data on hand and avoid the trap of mere function-fitting. However, this is the data science problem that needs to be tackled as a priority. 

Adopting such technology and data science to keep anticipated healthcare needs under the threshold of availability in a region requires multiple privacy trade-offs, which will require thoughtful legislation so that the solutions invented for enduring the current pandemic do not lead to loss of privacy in perpetuity. However, given the immense economic as well as hidden medical toll of the shutdown, we urgently need to construct an early warning system that tells us to enhance suppression measures if the next COVID-19 outbreak peak might overwhelm our regional healthcare system. It is imperative that we focus our attention on using data science to anticipate, and manage, regional health system resource needs based on local measurements of virus activity and effects of population distancing….(More)”.