Viruses Cross Borders. To Fight Them, Countries Must Let Medical Data Flow, Too


Nigel Cory at ITIF: “If nations could regulate viruses the way many regulate data, there would be no global pandemics. But the sad reality is that, in the midst of the worst global pandemic in living memory, many nations make it unnecessarily complicated and costly, if not illegal, for health data to cross their borders. In so doing, they are hindering critically needed medical progress.

In the COVID-19 crisis, data analytics powered by artificial intelligence (AI) is critical to identifying the exact nature of the pandemic and developing effective treatments. The technology can produce powerful insights and innovations, but only if researchers can aggregate and analyze data from populations around the globe. And that requires data to move across borders as part of international research efforts by private firms, universities, and other research institutions. Yet, some countries, most notably China, are stopping health and genomic data at their borders.

Indeed, despite the significant benefits to companies, citizens, and economies that arise from the ability to easily share data across borders, dozens of countries—across every stage of development—have erected barriers to cross-border data flows. These data-residency requirements strictly confine data within a country’s borders, a concept known as “data localization,” and many countries have especially strict requirements for health data.

China is a noteworthy offender, having created a new digital iron curtain that requires data localization for a range of data types, including health data, as part of its so-called “cyber sovereignty” strategy. A May 2019 State Council regulation required genomic data to be stored and processed locally by Chinese firms—and foreign organizations are prohibited. This is in service of China’s mercantilist strategy to advance its domestic life sciences industry. While there has been collaboration between U.S. and Chinese medical researchers on COVID-19, including on clinical trials for potential treatments, these restrictions mean that it won’t involve the transfer, aggregation, and analysis of Chinese personal data, which otherwise might help find a treatment or vaccine. If China truly wanted to make amends for blocking critical information during the early stages of the outbreak in Wuhan, then it should abolish this restriction and allow genomic and other health data to cross its borders.

But China is not alone in limiting data flows. Russia requires all personal data, health-related or not, to be stored locally. India’s draft data protection bill permits the government to classify any sensitive personal data as critical personal data and mandate that it be stored and processed only within the country. This would be consistent with recent debates and decisions to require localization for payments data and other types of data. And despite its leading role in pushing for the free flow of data as part of new digital trade agreementsAustralia requires genomic and other data attached to personal electronic health records to be only stored and processed within its borders.

Countries also enact de facto barriers to health and genomic data transfers by making it harder and more expensive, if not impractical, for firms to transfer it overseas than to store it locally. For example, South Korea and Turkey require firms to get explicit consent from people to transfer sensitive data like genomic data overseas. Doing this for hundreds or thousands of people adds considerable costs and complexity.

And the European Union’s General Data Protection Regulation encourages data localization as firms feel pressured to store and process personal data within the EU given the restrictions it places on data transfers to many countries. This is in addition to the renewed push for local data storage and processing under the EU’s new data strategy.

Countries rationalize these steps on the basis that health data, particularly genomic data, is sensitive. But requiring health data to be stored locally does little to increase privacy or data security. The confidentiality of data does not depend on which country the information is stored in, only on the measures used to store it securely, such as via encryption, and the policies and procedures the firms follow in storing or analyzing the data. For example, if a nation has limits on the use of genomics data, then domestic organizations using that data face the same restrictions, whether they store the data in the country or outside of it. And if they share the data with other organizations, they must require those organizations, regardless of where they are located, to abide by the home government’s rules.

As such, policymakers need to stop treating health data differently when it comes to cross-border movement, and instead build technical, legal, and ethical protections into both domestic and international data-governance mechanisms, which together allow the responsible sharing and transfer of health and genomic data.

This is clearly possible—and needed. In February 2020, leading health researchers called for an international code of conduct for genomic data following the end of their first-of-its-kind international data-driven research project. The project used a purpose-built cloud service that stored 800 terabytes of genomic data on 2,658 cancer genomes across 13 data centers on three continents. The collaboration and use of cloud computing were transformational in enabling large-scale genomic analysis….(More)”.

Models v. Evidence


Jonathan Fuller at the Boston Review: “COVID-19 has revealed a contest between two competing philosophies of scientific knowledge. To manage the crisis, we must draw on both….The lasting icon of the COVID-19 pandemic will likely be the graphic associated with “flattening the curve.” The image is now familiar: a skewed bell curve measuring coronavirus cases that towers above a horizontal line—the health system’s capacity—only to be flattened by an invisible force representing “non-pharmaceutical interventions” such as school closures, social distancing, and full-on lockdowns.

How do the coronavirus models generating these hypothetical curves square with the evidence? What roles do models and evidence play in a pandemic? Answering these questions requires reconciling two competing philosophies in the science of COVID-19.

To some extent, public health epidemiology and clinical epidemiology are distinct traditions in health care, competing philosophies of scientific knowledge.

In one camp are infectious disease epidemiologists, who work very closely with institutions of public health. They have used a multitude of models to create virtual worlds in which sim viruses wash over sim populations—sometimes unabated, sometimes held back by a virtual dam of social interventions. This deluge of simulated outcomes played a significant role in leading government actors to shut borders as well as doors to schools and businesses. But the hypothetical curves are smooth, while real-world data are rough. Some detractors have questioned whether we have good evidence for the assumptions the models rely on, and even the necessity of the dramatic steps taken to curb the pandemic. Among this camp are several clinical epidemiologists, who typically provide guidance for clinical practice—regarding, for example, the effectiveness of medical interventions—rather than public health.

The latter camp has won significant media attention in recent weeks. Bill Gates—whose foundation funds the research behind the most visible outbreak model in the United States, developed by the Institute for Health Metrics and Evaluation (IHME) at the University of Washington—worries that COVID-19 might be a “once-in-a-century pandemic.” A notable detractor from this view is Stanford’s John Ioannidis, a clinical epidemiologist, meta-researcher, and reliable skeptic who has openly wondered whether the coronavirus pandemic might rather be a “once-in-a-century evidence fiasco.” He argues that better data are needed to justify the drastic measures undertaken to contain the pandemic in the United States and elsewhere.

Ioannidis claims, in particular, that our data about the pandemic are unreliable, leading to exaggerated estimates of risk. He also points to a systematic review published in 2011 of the evidence regarding physical interventions that aim to reduce the spread of respiratory viruses, worrying that the available evidence is nonrandomized and prone to bias. (A systematic review specific to COVID-19 has now been published; it concurs that the quality of evidence is “low” to “very low” but nonetheless supports the use of quarantine and other public health measures.) According to Ioannidis, the current steps we are taking are “non-evidence-based.”…(More)”.

Which Covid-19 Data Can You Trust?


Article by Satchit Balsari, Caroline Buckee and Tarun Khanna: “The Covid-19 pandemic has created a tidal wave of data. As countries and cities struggle to grab hold of the scope and scale of the problem, tech corporations and data aggregators have stepped up, filling the gap with dashboards scoring social distancing based on location data from mobile phone apps and cell towers, contact-tracing apps using geolocation services and Bluetooth, and modeling efforts to predict epidemic burden and hospital needs. In the face of uncertainty, these data can provide comfort — tangible facts in the face of many unknowns.

In a crisis situation like the one we are in, data can be an essential tool for crafting responses, allocating resources, measuring the effectiveness of interventions, such as social distancing, and telling us when we might reopen economies. However, incomplete or incorrect data can also muddy the waters, obscuring important nuances within communities, ignoring important factors such as socioeconomic realities, and creating false senses of panic or safety, not to mention other harms such as needlessly exposing private information. Right now, bad data could produce serious missteps with consequences for millions.

Unfortunately, many of these technological solutions — however well intended — do not provide the clear picture they purport to. In many cases, there is insufficient engagement with subject-matter experts, such as epidemiologists who specialize in modeling the spread of infectious diseases or front-line clinicians who can help prioritize needs. But because technology and telecom companies have greater access to mobile device data, enormous financial resources, and larger teams of data scientists, than academic researchers do, their data products are being rolled out at a higher volume than high quality studies.

Whether you’re a CEO, a consultant, a policymaker, or just someone who is trying to make sense of what’s going on, it’s essential to be able to sort the good data from the misleading — or even misguided.

Common Pitfalls

While you may not be qualified to evaluate the particulars of every dashboard, chart, and study you see, there are common red flags to let you know data might not be reliable. Here’s what to look out for:

Data products that are too broad, too specific, or lack context. Over-aggregated data — such as national metrics of physical distancing that some of our largest data aggregators in the world are putting out — obscure important local and regional variation, are not actionable, and mean little if used for inter-nation comparisons given the massive social, demographic, and economic disparities in the world….(More)”.

Can we escape from information overload?


Tom Lamont at 1843 (Economist): “…Information overload was a term coined in the mid-1960s by Bertram Gross, an American social scientist. In 1970 a writer called Alvin Toffler, who was known at the time as a dependable futurist – someone who prognosticated for a living – popularised the idea of information overload as part of a set of bleak predictions about eventual human dependence on technology. (Good call, Alvin.) Information overload can occur in man or machine, wrote another set of academics in a 1977 study, “when the amount of input to a system exceeds its processing capacity”. Then came VHS, home computers, the internet, mobile phones, mobile-phones-with-the-internet – and waves of anxiety that we might be reaching the limits of our capacity.

A study in 2011 found that on a typical day Americans were taking in five times as much information as they had done 25 years earlier – and this was before most people had bought smartphones. In 2019 a study by academics in Germany, Ireland and Denmark identified that humans’ attention span is shrinking, probably because of digital intrusion, but was manifesting itself both “online and offline”.

By that time an organisation called the Information Overload Research Group had done a study which estimated that hundreds of billions of dollars were being shucked away from the American economy every year, in miscellaneous productivity costs, by an overload of data. The group had been co-founded in 2007 by a computer engineer-turned-consultant, Nathan Zeldes, who had once been asked by Intel, a computer-chip maker, to reduce the burden of email imposed on its workers. By the end of 2019 Zeldes was ready to sound a note of defeat. “I’d love to give you a magic potion that would restore your attention span to that of your grandparents,” he wrote in a blog, “but I can’t. After over a decade of smartphone use and social media, the harm is probably irreversible.” He advised people to take up a hobby.

In an age of overload it can feel as though technology has rather chanced its luck. Pushed too much, too far, bone-deep. Even before coronavirus spread across the world, parts of the culture had started to tack towards isolation and deprivation as desirable lifestyle signifiers, hot-this-year, as if some time spent alone and without a device was the new season’s outfit, the next Cronut, another twerk.

Before a pandemic limited the appeal of wallowing in someone else’s tepid water, flotation-tank centres were opening all over London. In the Czech Republic there are spas that sell clients a week in the dark in shuttered, serviced suites. “Social distancing is underrated,” Edward Snowden tweeted, deadpan, in March 2020: a corona-joke, but one that will have spoken to the tech bros of Silicon Valley, for whom retreats were the treat of choice.

Recently, I saw that a person called Celine in San Francisco had tweeted to her 2,500-odd followers about the difficulty of “trying to date SF guys in between their week-long meditation retreats, Tahoe weekends, month-long remote work sessions…” About 4,000 people tapped to endorse the sentiment, launching Celine onto an exponential number of strangers’ screens, including my own. The default sound for any new tweet is a whistle, somewhere between a neighbourly “yoo-hoo” and a dog-walker’s call to heel.

Hilda Burke, a British psychotherapist who has written about smartphone addiction, told me that part of the problem in this age of overload is the yoo-hooing insistence with which each new parcel of information seeks our attention. Speakers chime. Pixelated columns shuffle urgently or icons bounce, as if to signal that here is the fire. Our twitch response to urgency is triggered, in bad faith.

When Celine’s tweet whistled onto my phone one idle Friday I couldn’t understand why I found it mildly stressful to read. Was it that it made me feel old? That I already had enough to think about? Eventually I realised that, for me, every tweet is a bit stressful. Every trifling, whistling update that comes at us, Burke said, “is like a sheep dressed in wolf’s clothing. The body springs to attention, ready to run or fight, and for nothing that’s worth it. This is confusing.”…(More)”

Assessing the feasibility of real-world data


Blog Post by Manuela Di Fusco: “Real-world data (RWD) and real-world evidence (RWE) are playing an increasing role in healthcare decision making.

The conduct of RWD studies involves many interconnected stages, ranging from the definition of research questions of high scientific interest, to the design of a study protocol and statistical plan, and the conduct of the analyses, quality reviews, publication and presentation to the scientific community. Every stage requires extensive knowledge, expertise and efforts from the multidisciplinary research team.

There are a number of well-accepted guidelines for good procedural practices in RWD . Despite their stress on the importance of data reliability, relevance and studies being fit for purpose, their recommendations generally focus on methods/analyses and transparent reporting of results. There often is little focus on feasibility concerns at the early stages of a study; ongoing RWD initiatives, too, focus on improving standards and practices for data collection and analyses.

RWD and RWE are playing an increasing role in healthcare decision making.”

The availability and use of new data sources, which have the ability to store health-related data, have been growing globally, and include mobile technologies, electronic patient-reported outcome tools and wearables [1]. 

As data sources exist in various formats, and are often created for non-research purposes, they have inherent associated limitations – such as missing data. Determining the best approach for collecting complete and quality data is of critical importance. At study conception, it is not always clear if it is reasonable to expect that the research question of interest could be fully answered and all analyses carried out. Numerous methodological and data collection challenges can emerge during study execution. However, some of these downstream study challenges could be proactively addressed through an early feasibility study, concurrent to protocol development. For example, during this exploratory study, datasets may be explored carefully to ensure data points deemed relevant for the study are routinely ascertained and captured sufficiently, despite potential missing data and/or other data source limitations.

Determining the best approach for collecting complete and quality data is of critical importance.”

This feasibility assessment serves primarily as a first step to gain knowledge of the data and ensure realistic assumptions are included in the protocol; relevant sensitivity analyses can test those assumptions, hence setting the basis for successful study development.  

Below is a list of key feasibility questions which may guide the technical exploration and conceptualization of a retrospective RWD study. The list is based on experience supporting observational studies on a global scale and is not intended to be exhaustive and representative of all preparatory activities. This technical feasibility analysis should be carried out while considering other relevant aspects, including the novelty and strategic value of the study versus the existing evidence – in the form of randomized controlled trial data and other RWE –, the intended audience, data access/protection, reporting requirements and external validity aspects.

This feasibility assessment serves primarily as a first step to gain knowledge of the data and ensure realistic assumptions are included in the protocol…”

The list may support early discussions among study team members during the preparation and determination of a RWD study.

  • Can the population be accurately identified in the data source?

Diagnosis and procedures can be identified through International Classification of Diseases codes; published code validation studies on the population of interest can be a useful guide.

  • How generalizable is the population of the data source?

Generalizability issues should be recognized upfront. For example, the patient population for which data is available in the data source might be restricted to a specific geographic region, health insurance plan (e.g. Medicare or commercial), system (hospital/inpatient and ambulatory) or group (e.g. age, gender)…(More)”.

10 Tips for Making Sense of COVID-19 Models for Decision-Making


Elizabeth Stuart et al at John Hopkins School of Public Health: “Models can be enormously useful in the context of an epidemic if they synthesize evidence and scientific knowledge. The COVID-19 pandemic is a complex phenomenon and in such a dynamic setting it is nearly impossible to make informed decisions without the assistance models can provide. However, models don’t perfectly capture reality: They simplify reality to help answer specific questions.

Below are 10 tips for making sense of COVID-19 models for decision-making such as directing health care resources to certain areas or identifying how long social distancing policies may need to be in effect.

Flattening the Curve for COVIX-19
  1. Make sure the model fits the question you are trying to answer.
    There are many different types of models and a wide variety of questions that models can be used to address. There are three that can be helpful for COVID-19:
    1. Models that simplify how complex systems work, such as disease transmission. This is often done by putting people into compartments related to how a disease spreads, like “susceptible,” “infected,” and “recovered.” While these can be overly simplistic with few data inputs and don’t allow for the uncertainty that exists in a pandemic, they can be useful in the short term to understand basic structures. But these models generally cannot be implemented in ways that account for complex systems or when there is ongoing system or individual behavioral change.
    2. Forecasting models try to predict what will actually happen. They work by using existing data to project out conclusions over a relatively short time horizon. But these models are challenging to use for mid-term assessment—like a few months out—because of the evolving nature of pandemics.
    3. Strategic models show multiple scenarios to consider the potential implications of different interventions and contexts. These models try to capture some of the uncertainty about the underlying disease processes and behaviors. They might take a few values of such as the case fatality ratio or the effectiveness of social distancing measures, and play out different scenarios for disease spread over time. These kinds of models can be particularly useful for decision-making.
  2. Be mindful that forecast models are often built with the goal of change, which affects their shelf life.
    The irony of many COVID-19 modeling purposes is that in some cases, especially for forecasting, a key purpose in building and disseminating the model is to invoke behavior change at individual or system levels—e.g., to reinforce the need for physical distancing.

    This makes it difficult to assess the performance of forecasting models since the results of the model itself (and reactions to it) become part of the system. In these cases, a forecasting model may look like it was inaccurate, but it may have been accurate for an unmitigated scenario with no behavior change. In fact, a public health success may be when the forecasts do not come to be!
  3. Look for models (and underlying collaborations) that include diverse aspects and expertise.
    One of the challenges in modeling COVID-19 is the multitude of factors involved: infectious disease dynamics, social and behavioral factors such as how frequently individuals interact, economic factors such as employment and safety net policies, and more.

    One benefit is that we do know that COVID-19 is an infectious disease and we have a good understanding about how related diseases spread. Likewise, health economists and public health experts have years of experience understanding complex social systems. Look for models, and their underlying collaborations, that take advantage of that breadth of existing knowledge….(More)”.

A call for a new generation of COVID-19 models


Blog post by Alex Engler: “Existing models have been valuable, but they were not designed to support these types of critical decisions. A new generation of models that estimate the risk of COVID-19 spread for precise geographies—at the county or even more localized level—would be much more informative for these questions. Rather than produce long-term predictions of deaths or hospital utilization, these models could estimate near-term relative risk to inform local policymaking. Going forward, governors and mayors need local, current, and actionable numbers.

Broadly speaking, better models would substantially aid in the “adaptive response” approach to re-opening the economy. In this strategy, policymakers cyclically loosen and re-tighten restrictions, attempting to work back towards a healthy economy without moving so fast as to allow infections to take off again. In an ideal process, restrictions would be eased at such a pace that balances a swift return to normalcy with reducing total COVID-19 infections. Of course, this is impossible in practice, and thus some continued adjustments—the flipping of various controls off and on again—will be necessary. More precise models can help improve this process, providing another lens into when it will be safe to relax restrictions, thus making it easier to do without a disruptive back-and-forth. A more-or-less continuous easing of restrictions is especially valuable, since it is unlikely that second or third rounds of interventions (such as social distancing) would achieve the same high rates of compliance as the first round.

The proliferation of Covid19 Data

These models can incorporate cases, test-positive rates, hospitalization information, deaths, excess deaths, and other known COVID-19 data. While all these data sources are incomplete, an expanding body of research on COVID-19 is making the data more interpretable. This research will become progressively more valuable with more data on the spread of COVID-19 in the U.S. rather than data from other countries or past pandemics.

Further, a broad range of non-COVID-19 data can also inform risk estimates: Population density, age distributions, poverty and uninsured rates, the number of essential frontline workers, and co-morbidity factors can also be included. Community mobility reports from Google and Unacast’s social distancing scorecard can identify how easing restrictions are changing behavior. Small area estimates also allow the models to account for the risk of spread from other nearby geographies. Geospatial statistics cannot account for infectious spread between two large neighboring states, but they would add value for adjacent zip codes. Lastly, many more data sources are in the works, like open patient data registries, the National Institutes of Health’s (NIH) study of asymptomatic personsself-reported symptoms data from Facebook, and (potentially) new randomized surveys. In fact, there are so many diverse and relevant data streams, that models can add value simply be consolidating daily information into just a few top-line numbers that are comparable across the nation.

FiveThirtyEight has effectively explained that making these models is tremendously difficult due to incomplete data, especially since the U.S. is not testing enough or in statistically valuable ways. These challenges are real, but decision-makers are currently using this same highly flawed data to make inferences and policy choices. Despite the many known problems, elected officials and public health services have no choice. Frequently, they are evaluating the data without the time and expertise to make reasoned statistical interpretations based on epidemiological research, leaving significant opportunity for modeling to help….(More)”.

The institutionalization of digital public health: lessons learned from the COVID19 app


Paper by Ciro Cattuto and Alessandro Spina: “Amid the outbreak of the SARS-CoV-2 pandemic, there has been a call to use innovative digital tools for the purpose of protecting public health. There are a number of proposals to embed digital solutions into the regulatory strategies adopted by public authorities to control the spread of the coronavirus more effectively. They range from algorithms to detect population movements by using telecommunication data to the use of artificial intelligence and high-performance computing power to detect patterns in the spread of the virus. However, the use of a mobile phone application for contact tracing is certainly the most popular.

These proposals, which have a very powerful persuasive force, and have apparently contributed to the success of public health response in a few Asian countries, also raise questions and criticisms in particular with regard to the risks that these novel digital surveillance systems pose for privacy and in the long term for our democracies.

With this short paper, we would like to describe the pattern that has led to the institutionalization of digital tools for public health purposes. By tracing their origins to “digital epidemiology”, an approach originated in the early 2010s, we will expose that, whilst there exists limited experimental knowledge on the use of digital tools for tracking disease, this is the first time in which they are being introduced by policy-makers into the set of non-clinical emergency strategies to a major public health crisis….(More)”

Data Sharing in the Context of Health-Related Citizen Science


Paper by Mary A. Majumder and Amy L. McGuire: “As citizen science expands, questions arise regarding the applicability of norms and policies created in the context of conventional science. This article focuses on data sharing in the conduct of health-related citizen science, asking whether citizen scientists have obligations to share data and publish findings on par with the obligations of professional scientists. We conclude that there are good reasons for supporting citizen scientists in sharing data and publishing findings, and we applaud recent efforts to facilitate data sharing. At the same time, we believe it is problematic to treat data sharing and publication as ethical requirements for citizen scientists, especially where there is the potential for burden and harm without compensating benefit…(More)”.

How ‘Social Distancing’ Can Get Lost in Translation


Ruth Michaelson at the Smithsonian Magazine: “…Even as tongue-in-cheek phrases like “avoiding the Rona” abound on American social media, to say nothing of the rapper Cardi B’s enunciation of “coronavirus,” other terms like “social distancing,” or “lockdown,” have quickly entered our daily vocabulary.

But what these terms mean in different countries (or regions or cities within regions, in Wuhan’s case) is a question of translation as well as interpretation. Communities around the world remain under government-enforced lockdown to prevent the spread of COVID-19, but few have understood “stay at home,” or liu-zai-jia-li in Mandarin, to mean precisely the same thing. The concept of social distancing, normally indicating a need to avoid contact with others, can mean anything from avoiding public transport to the World Health Organization’s recommendation to “maintain at least one metre distance,” from those who are coughing or sneezing. In one Florida county, officials explained the guideline by suggesting to residents they stay “one alligator” away from each other.

The way that terms like “social distancing,” are adopted across languages provides a way to understand how countries across the globe are coping with the COVID-19 threat. For instance, the Mandarin Chinese translation of “social distancing”, or ju-li-yuan-dian, is interpreted differently in Wuhan dialect, explains Jin. “Instead of ‘keep a distance,’ Wuhan dialect literally translates this as ‘send far away.’”

Through these small shifts in language, says Jin, “people in Wuhan expose their feelings about their own suffering.”

In Sweden, meanwhile, has currently registered more than 16,000 cases of COVID-19, the highest incidence rate in Scandinavia. The government has taken an unusually lax approach to enforcing its pandemic mitigation policies, placing the emphasis on citizens to self-police, perhaps to ill effect. While Swedes do use terms like social distancing, or rather the noun socialt avstånd, these are accompanied by other ideas that are more popular in Sweden. “Herd immunity or flockimmunitet is a very big word around here,” says Jan Pedersen, director of the Institute for Interpreting and Translation Studies at Stockholm University.

“Sweden is famous for being a very consensus driven society, and this applies here as well,” he says. “There’s a great deal of talk about trust.” In this case, he explained, citizens have trust – tillit – in the authorities to make good choices and so choose to take personligt ansvar, or personal responsibility.

Pedersen has also noticed some new language developing as a result. “The word recommendation, rekommendationer, in Sweden has taken on much stronger force,” he said. “Recommendation used to be a recommendation, what you could do or not. Now it’s slightly stronger … We would use words like obey with laws, but now here you obey a recommendation, lyda rekommendationer.”…(More)”.