The potential of Data Collaboratives for COVID19


Blog post by Stefaan Verhulst: “We live in almost unimaginable times. The spread of COVID-19 is a human tragedy and global crisis that will impact our communities for many years to come. The social and economic costs are huge and mounting, and they are already contributing to a global slowdown. Every day, the emerging pandemic reveals new vulnerabilities in various aspects of our economic, political and social lives. These include our vastly overstretched public health services, our dysfunctional political climate, and our fragile global supply chains and financial markets.

The unfolding crisis is also making shortcomings clear in another area: the way we re-use data responsibly. Although this aspect of the crisis has been less remarked upon than other, more obvious failures, those who work with data—and who have seen its potential to impact the public good—understand that we have failed to create the necessary governance and institutional structures that would allow us to harness data responsibly to halt or at least limit this pandemic. A recent article in Stat, an online journal dedicated to health news, characterized the COVID-19 outbreak as “a once-in-a-century evidence fiasco.” The article continues: 

“At a time when everyone needs better information, […] we lack reliable evidence on how many people have been infected with SARS-CoV-2 or who continue to become infected. Better information is needed to guide decisions and actions of monumental significance and to monitor their impact.” 

It doesn’t have to be this way, and these data challenges are not an excuse for inaction. As we explain in what follows, there is ample evidence that the re-use of data can help mitigate health pandemics. A robust (if somewhat unsystematized) body of knowledge could direct policymakers and others in their efforts. In the second part of this article, we outline eight steps that key stakeholders can and should take to better re-use data in the fight against COVID-19. In particular, we argue that more responsible data stewardship and increased use of data collaboratives are critical….(More)”. 

Mobile phone data and COVID-19: Missing an opportunity?


Paper by Nuria Oliver, et al: “This paper describes how mobile phone data can guide government and public health authorities in determining the best course of action to control the COVID-19 pandemic and in assessing the effectiveness of control measures such as physical distancing. It identifies key gaps and reasons why this kind of data is only scarcely used, although their value in similar epidemics has proven in a number of use cases. It presents ways to overcome these gaps and key recommendations for urgent action, most notably the establishment of mixed expert groups on national and regional level, and the inclusion and support of governments and public authorities early on. It is authored by a group of experienced data scientists, epidemiologists, demographers and representatives of mobile network operators who jointly put their work at the service of the global effort to combat the COVID-19 pandemic….(More)”.

Data Protection under SARS-CoV-2


GDPR Hub: “The sudden outbreak of cases of COVID-19-afflictions (“Corona-Virus”), which was declared a pandemic by the WHO affects data protection in various ways. Different data protection authorities published guidelines for employers and other parties involved in the processing of data related to the Corona-Virus (read more below).

The Corona-Virus has also given cause to the use of different technologies based on data collection and other data processing activities by the EU/EEA member states and private companies. These processing activities mostly focus on preventing and slowing the further spreading of the Corona-Virus and on monitoring the citizens’ abidance with governmental measures such as quarantine. Some of them are based on anonymous or anonymized data (like for statistics or movement patterns), but some proposals also revolved around personalized tracking.

At the moment, it is not easy to figure out, which processing activities are actually supposed to be conducted and which are only rumors. This page will therefore be adapted once certain processing activities have been confirmed. For now, this article does not assess the lawfulness of particular processing activities, but rather outlines the general conditions for data processing in connection with the Corona-Virus.

It must be noted that several activities – such as monitoring, if citizens comply with quarantine and stay indoors by watching at mobile phone locations – can be done without having to use personal data under Article 4(1) GDPR, if all necessary information can be derived from anonymised data. The GDPR does not apply to activities that only rely on anonymised data….(More)”.

Why isn’t the government publishing more data about coronavirus deaths?


Article by Jeni Tennison: “Studying the past is futile in an unprecedented crisis. Science is the answer – and open-source information is paramount…Data is a necessary ingredient in day-to-day decision-making – but in this rapidly evolving situation, it’s especially vital. Everything has changed, almost overnight. Demands for foodtransport, and energy have been overhauled as more people stop travelling and work from home. Jobs have been lost in some sectors, and workers are desperately needed in others. Historic experience can no longer tell us how our society or economy is working. Past models hold little predictive power in an unprecedented situation. To know what is happening right now, we need up-to-date information….

This data is also crucial for scientists, who can use it to replicate and build upon each other’s work. Yet no open data has been published alongside the evidence for the UK government’s coronavirus response. While a model that informed the US government’s response is freely available as a Google spreadsheet, the Imperial College London model that prompted the current lockdown has still not been published as open-source code. Making data open – publishing it on the web, in spreadsheets, without restrictions on access – is the best way to ensure it can be used by the people who need it most.

There is currently no open data available on UK hospitalisation rates; no regional, age or gender breakdown of daily deaths. The more granular breakdown of registered deaths provided by the Office of National Statistics is only published on a weekly basis, and with a delay. It is hard to tell whether this data does not exist or the NHS has prioritised creating dashboards for government decision makers rather than informing the rest of the country. But the UK is making progress with regard to data: potential Covid-19 cases identified through online and call-centre triage are now being published daily by NHS Digital.

Of course, not all data should be open. Singapore has been publishing detailed data about every infected person, including their age, gender, workplace, where they have visited and whether they had contact with other infected people. This can both harm the people who are documented and incentivise others to lie to authorities, undermining the quality of data.

When people are concerned about how data about them is handled, they demand transparency. To retain our trust, governments need to be open about how data is collected and used, how it’s being shared, with whom, and for what purpose. Openness about the use of personal data to help tackle the Covid-19 crisis will become more pressing as governments seek to develop contact tracing apps and immunity passports….(More)”.

Urgently Needed for Policy Guidance: An Operational Tool for Monitoring the COVID-19 Pandemic


Paper by Stephane Luchini et al:” The radical uncertainty around the current COVID19 pandemics requires that governments around the world should be able to track in real time not only how the virus spreads but, most importantly, what policies are effective in keeping the spread of the disease under check. To improve the quality of health decision-making, we argue that it is necessary to monitor and compare acceleration/deceleration of confirmed cases over health policy responses, across countries. To do so, we provide a simple mathematical tool to estimate the convexity/concavity of trends in epidemiological surveillance data. Had it been applied at the onset of the crisis, it would have offered more opportunities to measure the impact of the policies undertaken in different Asian countries, and to allow European and North-American governments to draw quicker lessons from these Asian experiences when making policy decisions. Our tool can be especially useful as the epidemic is currently extending to lower-income African and South American countries, some of which have weaker health systems….(More)”.

Privacy Protection Key for Using Patient Data to Develop AI Tools


Article by  Jessica Kent: “Clinical data should be treated as a public good when used for research or artificial intelligence algorithm development, so long as patients’ privacy is protected, according to a report from the Radiological Society of North America (RSNA).

As artificial intelligence and machine learning are increasingly applied to medical imaging, bringing the potential for streamlined analysis and faster diagnoses, the industry still lacks a broad consensus on an ethical framework for sharing this data.

“Now that we have electronic access to clinical data and the data processing tools, we can dramatically accelerate our ability to gain understanding and develop new applications that can benefit patients and populations,” said study lead author David B. Larson, MD, MBA, from the Stanford University School of Medicine. “But unsettled questions regarding the ethical use of the data often preclude the sharing of that information.”

To offer solutions around data sharing for AI development, RSNA developed a framework that highlights how to ethically use patient data for secondary purposes.

“Medical data, which are simply recorded observations, are acquired for the purposes of providing patient care,” Larson said….(More)”

Unpredictable Residency during the COVID-19 Pandemic Spells Trouble for the 2020 Census Count


Blog by Diana Elliott and Robert Santos: “Social distancing measures to curtail the community spread of COVID-19 have upended daily life. Just before lockdowns were implemented across the country, there was tremendous movement and migration of people relocating to different residences to shelter in place. This makes sense for the people involved but could be disastrous for the communities they fled and the final 2020 Census counts.

Pandemic-based migration undermines an accurate count

The 2020 Census, like most data collected by the US Census Bureau, is residence based. In the years leading up to 2020, the US Census Bureau worked diligently on the quality of the Master Address File, or the catalog of all residential addresses in the country. Staff account for newly built housing developments and buildings, apartment units or accessory dwelling units that are used as permanent residences, and the demolition of homes and apartments in the past decade. Census materials are sent to an address, rather than a person.

Most residences across America have already received their 2020 Census invitation. Whether completed online, by paper, by phone, or in person, the first official question on the 2020 Census questionnaire is “How many people were living or staying in this house, apartment, or mobile home on April 1, 2020?” Households are expected to answer this based on the concept of “usual residence,” or the place where a person lives and sleeps most of the time.

Despite written guidance provided on the 2020 Census on how to answer this question, doing so may be wrought with complexities and nuance from the pandemic.

First, research reveals that respondents do not often read questionnaire instructions; they dive in and start answering. With many people scrambling to other counties, cities, and states to hunker down for the long haul with loved ones, this will lead to incorrect counts when people are counted at temporary addresses.

Second, for many, the concept of “usual residence” has little relevance in the uncertainty unfolding during the COVID-19 pandemic. What if your temporary address becomes your permanent address? What does “usual residence” mean during a global epidemic that could stretch for 18 months or more? And perhaps more importantly, what should it mean?

Finally, there is the added complication of census operational delays (PDF). Self-response to the 2020 Census has been extended into August, as have the nonresponse follow-up efforts, when enumerators knock on the doors of those who haven’t yet answered the census. Additional delays seem unavoidable. The longer the delay, the more time there is for people who have not yet completed a census form to realize their temporary plan has evolved into a state of permanence….(More)”.

Responding to COVID-19 with AI and machine learning


Paper by Mihaela van der Schaar et al: “…AI and machine learning can use data to make objective and informed recommendations, and can help ensure that scarce resources are allocated as efficiently as possible. Doing so will save lives and can help reduce the burden on healthcare systems and professionals….

1. Managing limited resources

AI and machine learning can help us identify people who are at highest risk of being infected by the novel coronavirus. This can be done by integrating electronic health record data with a multitude of “big data” pertaining to human-to-human interactions (from cellular operators, traffic, airlines, social media, etc.). This will make allocation of resources like testing kits more efficient, as well as informing how we, as a society, respond to this crisis over time….

2. Developing a personalized treatment course for each patient 

As mentioned above, COVID-19 symptoms and disease evolution vary widely from patient to patient in terms of severity and characteristics. A one-size-fits-all approach for treatment doesn’t work. We also are a long way off from mass-producing a vaccine. 

Machine learning techniques can help determine the most efficient course of treatment for each individual patient on the basis of observational data about previous patients, including their characteristics and treatments administered. We can use machine learning to answer key “what-if” questions about each patient, such as “What if we postpone a couple hours before putting them on a ventilator?” or “Would the outcome for this patient be better if we switched them from supportive care to an experimental treatment earlier?”

3. Informing policies and improving collaboration

…It’s hard to get a clear sense of which decisions result in the best outcomes. In such a stressful situation, it’s also hard for decision-makers to be aware of the outcomes of decisions being made by their counterparts elsewhere. 

Once again, data-driven AI and machine learning can provide objective and usable insights that far exceed the capabilities of existing methods. We can gain valuable insight into what the differences between policies are, why policies are different, which policies work better, and how to design and adopt improved policies….

4. Managing uncertainty

….We can use an area of machine learning called transfer learning to account for differences between populations, substantially eliminating bias while still extracting usable data that can be applied from one population to another. 

We can also use methods to make us aware of the degree of uncertainty of any given conclusion or recommendation generated from machine learning. This means that decision-makers can be provided with confidence estimates that tell them how confident they can be about a recommended course of action.

5. Expediting clinical trials

Randomized clinical trials (RCTs) are generally used to judge the relative effectiveness of a new treatment. However, these trials can be slow and costly, and may fail to uncover specific subgroups for which a treatment may be most effective. A specific problem posed by COVID-19 is that subjects selected for RCTs tend not to be elderly, or to have other conditions; as we know, COVID-19 has a particularly severe impact on both those patient groups….

The AI and machine learning techniques I’ve mentioned above do not require further peer review or further testing. Many have already been implemented on a smaller scale in real-world settings. They are essentially ready to go, with only slight adaptations required….(More) (Full Paper)”.

A Closer Look at Location Data: Privacy and Pandemics


Assessment by Stacey Gray: “In light of COVID-19, there is heightened global interest in harnessing location data held by major tech companies to track individuals affected by the virus, better understand the effectiveness of social distancing, or send alerts to individuals who might be affected based on their previous proximity to known cases. Governments around the world are considering whether and how to use mobile location data to help contain the virus: Israel’s government passed emergency regulations to address the crisis using cell phone location data; the European Commission requested that mobile carriers provide anonymized and aggregate mobile location data; and South Korea has created a publicly available map of location data from individuals who have tested positive. 

Public health agencies and epidemiologists have long been interested in analyzing device location data to track diseases. In general, the movement of devices effectively mirrors movement of people (with some exceptions discussed below). However, its use comes with a range of ethical and privacy concerns. 

In order to help policymakers address these concerns, we provide below a brief explainer guide of the basics: (1) what is location data, (2) who holds it, and (3) how is it collected? Finally we discuss some preliminary ethical and privacy considerations for processing location data. Researchers and agencies should consider: how and in what context location data was collected; the fact and reasoning behind location data being classified as legally “sensitive” in most jurisdictions; challenges to effective “anonymization”; representativeness of the location dataset (taking into account potential bias and lack of inclusion of low-income and elderly subpopulations who do not own phones); and the unique importance of purpose limitation, or not re-using location data for other civil or law enforcement purposes after the pandemic is over….(More)”.

Human migration: the big data perspective


Alina Sîrbu et al at the International Journal of Data Science and Analytics: “How can big data help to understand the migration phenomenon? In this paper, we try to answer this question through an analysis of various phases of migration, comparing traditional and novel data sources and models at each phase. We concentrate on three phases of migration, at each phase describing the state of the art and recent developments and ideas. The first phase includes the journey, and we study migration flows and stocks, providing examples where big data can have an impact. The second phase discusses the stay, i.e. migrant integration in the destination country. We explore various data sets and models that can be used to quantify and understand migrant integration, with the final aim of providing the basis for the construction of a novel multi-level integration index. The last phase is related to the effects of migration on the source countries and the return of migrants….(More)”.