UK response to pandemic hampered by poor data practices


Report for the Royal Society: “The UK is well behind other countries in making use of data to have a real time understanding of the spread and economic impact of the pandemic according to Data Evaluation and Learning for Viral Epidemics (DELVE), a multi-disciplinary group convened by the Royal Society.

The report, Data Readiness: Lessons from an Emergency, highlights how data such as aggregated and anonymised mobility and payment transaction data, already gathered by companies, could be used to give a more accurate picture of the pandemic at national and local levels.  That could in turn lead to improvements in evaluation and better targeting of interventions.

Maximising the value of big data at a time of crisis requires careful cooperation across the private sector, that is already gathering these data, the public sector, which can provide a base for aggregating and overseeing the correct use of the data and researchers who have the skills to analyse it for the public good.  This work needs to be developed in accordance with data protection legislation and respect people’s concerns about data security and privacy.

The report calls on the Government to extend the powers of the Office for National Statistics to enable them to support trustworthy access to ‘happenstance’ data – data that are already gathered but not for a specific public health purpose – and for the Government to fund pathfinder projects that focus on specific policy questions such as how we nowcast economic metrics and how we better understand population movements.

Neil Lawrence, DeepMind Professor of Machine Learning at the University of Cambridge, Senior AI Fellow at The Alan Turing Institute and an author of the report, said: “The UK has talked about making better use of data for the public good, but we have had statements of good intent, rather than action.  We need to plan better for national emergencies. We need to look at the National Risk Register through the lens of what data would help us to respond more effectively. We have to learn our lessons from experiences in this pandemic and be better prepared for future crises.  That means doing the work now to ensure that companies, the public sector and researchers have pathfinder projects up and running to share and analyse data and help the government to make better informed decisions.”  

During the pandemic, counts of the daily flow of people from one place to another between more than 3000 districts in Spain have been available at the click of a button, allowing policy makers to more effectively understand how the movement of people contributes to the spread of the virus. This was based on a collaboration between the country’s three main mobile phone operators.  In France, measuring the impact of the pandemic on consumer spending on a daily and weekly scale was possible as a result of coordinated cooperation between the country’s national interbank network. 

Professor Lawrence added: “Mobile phone companies might provide a huge amount of anonymised and aggregated data that would allow us a much greater understanding of how people move around, potentially spreading the virus as they go.  And there is a wealth of other data, such as from transport systems. The more we understand about this pandemic, the better we can tackle it. We should be able to work together, the private and the public sectors, to harness big data for massive positive social good and do that safely and responsibly.”…(More)”

Open government data, uncertainty and coronavirus: An infodemiological case study


Paper by Nikolaos Yiannakoulias, Catherine E. Slavik, Shelby L. Sturrock, J. Connor Darlington: “Governments around the world have made data on COVID-19 testing, case numbers, hospitalizations and deaths openly available, and a breadth of researchers, media sources and data scientists have curated and used these data to inform the public about the state of the coronavirus pandemic. However, it is unclear if all data being released convey anything useful beyond the reputational benefits of governments wishing to appear open and transparent. In this analysis we use Ontario, Canada as a case study to assess the value of publicly available SARS-CoV-2 positive case numbers. Using a combination of real data and simulations, we find that daily publicly available test results probably contain considerable error about individual risk (measured as proportion of tests that are positive, population based incidence and prevalence of active cases) and that short term variations are very unlikely to provide useful information for any plausible decision making on the part of individual citizens. Open government data can increase the transparency and accountability of government, however it is essential that all publication, use and re-use of these data highlight their weaknesses to ensure that the public is properly informed about the uncertainty associated with SARS-CoV-2 information….(More)”

Data Disappeared


Essay by Samanth Subramanian: “Whenever President Donald Trump is questioned about why the United States has nearly three times more coronavirus cases than the entire European Union, or why hundreds of Americans are still dying every day, he whips out one standard comment. We find so many cases, he contends, because we test so many people. The remark typifies Trump’s deep distrust of data: his wariness of what it will reveal, and his eagerness to distort it. In April, when he refused to allow coronavirus-stricken passengers off the Grand Princess cruise liner and onto American soil for medical treatment, he explained: “I like the numbers where they are. I don’t need to have the numbers double because of one ship.” Unable—or unwilling—to fix the problem, Trump’s instinct is to fix the numbers instead.

The administration has failed on so many different fronts in its handling of the coronavirus, creating the overall impression of sheer mayhem. But there is a common thread that runs through these government malfunctions. Precise, transparent data is crucial in the fight against a pandemic—yet through a combination of ineptness and active manipulation, the government has depleted and corrupted the key statistics that public health officials rely on to protect us.

In mid-July, just when the U.S. was breaking and rebreaking its own records for daily counts of new coronavirus cases, the Centers for Disease Control and Prevention found itself abruptly relieved of its customary duty of collating national numbers on COVID-19 patients. Instead, the Department of Health and Human Services instructed hospitals to funnel their information to the government via TeleTracking, a small Tennessee firm started by a real estate entrepreneur who has frequently donated to the Republican Party. For a while, past data disappeared from the CDC’s website entirely, and although it reappeared after an outcry, it was never updated thereafter. The TeleTracking system was riddled with errors, and the newest statistics sometimes appeared after delays. This has severely limited the ability of public health officials to determine where new clusters of COVID-19 are blooming, to notice demographic patterns in the spread of the disease, or to allocate ICU beds to those who need them most.

To make matters more confusing still, Jared Kushner moved to start a separate coronavirus surveillance system run out of the White House and built by health technology giants—burdening already-overwhelmed officials and health care experts with a needless stream of queries. Kushner’s assessments often contradicted those of agencies working on the ground. When Andrew Cuomo, New York’s governor, asked for 30,000 ventilators, Kushner claimed the state didn’t need them: “I’m doing my own projections, and I’ve gotten a lot smarter about this.”…(More)”.

Covid-19 Data Is a Mess. We Need a Way to Make Sense of It.


Beth Blauer and Jennifer Nuzzo in the New York Times: “The United States is more than eight months into the pandemic and people are back waiting in long lines to be tested as coronavirus infections surge again. And yet there is still no federal standard to ensure testing results are being uniformly reported. Without uniform results, it is impossible to track cases accurately or respond effectively.

We test to identify coronavirus infections in communities. We can tell if we are casting a wide enough net by looking at test positivity — the percentage of people whose results are positive for the virus. The metric tells us whether we are testing enough or if the transmission of the virus is outpacing our efforts to slow it.

If the percentage of tests coming back positive is low, it gives us more confidence that we are not missing a lot of infections. It can also tell us whether a recent surge in cases may be a result of increased testing, as President Trump has asserted, or that cases are rising faster than the rate at which communities are able to test.

But to interpret these results properly, we need a national standard for how these results are reported publicly by each state. And although the Centers for Disease Control and Prevention issue protocols for how to report new cases and deaths, there is no uniform guideline for states to report testing results, which would tell us about the universe of people tested so we know we are doing enough testing to track the disease. (Even the C.D.C. was found in May to be reporting states’ results in a way that presented a misleading picture of the pandemic.)

Without a standard, states are deciding how to calculate positivity rates on their own — and their approaches are very different.

Some states include results from positive antigen-based tests, some states don’t. Some report the number of people tested, while others report only the number of tests administered, which can skew the overall results when people are tested repeatedly (as, say, at colleges and nursing homes)….(More)”

Crowdfunding during COVID-19: An international comparison of online fundraising


Paper by Greg Elmer, Sabrina Ward-Kimola and Anthony Glyn Burton: “This article performs a digital methods analysis on a sample of online crowdfunding campaigns seeking financial support for COVID related financial challenges. Building upon the crowdfunding literature this paper performs an international comparison of the goals of COVID related campaigns during the early spread of the pandemic. The paper seeks to determine the extent to which crowdfunding campaigns reflect current failures of governments to supress the COVID pandemic and support the financial challenges of families, communities and small businesses….(More)”.

A nudge helps doctors bring up end-of-life issues with their dying cancer patients


Article by Ravi Parikh et al: “When conversations about goals and end-of-life wishes happen early, they can improve patients’ quality of life and decrease their chances of dying on a ventilator or in an intensive care unit. Yet doctors treating cancer focus so much of their attention on treating the disease that these conversations tend to get put off until it’s too late. This leads to costly and often unwanted care for the patient.Related: 

This can be fixed, but it requires addressing two key challenges. The first is that it is often difficult for doctors to know how long patients have left to live. Even among patients in hospice care, doctors get it wrong nearly 70% of the time. Hospitals and private companies have invested millions of dollars to try and identify these outcomes, often using artificial intelligence and machine learning, although most of these algorithms have not been vetted in real-world settings.

In a recent set of studies, our team used data from real-time electronic medical records to develop a machine learning algorithm that identified which cancer patients had a high risk of dying in the next six months. We then tested the algorithm on 25,000 patients who were seen at our health system’s cancer practices and found it performed better than relying only on doctors to identify high-risk patients.

But just because such a tool exists doesn’t mean doctors will use it to prompt more conversations. The second challenge — which is even harder to overcome — is using machine learning to motivate clinicians to have difficult conversations with patients about the end of life.

We wondered if implementing a timely “nudge” that doctors received before seeing their high-risk patients could help them start the conversation.

To test this idea, we used our prediction tool in a clinical trial involving nine cancer practices. Doctors in the nudge group received a weekly report on how many end-of-life conversations they had compared to their peers, along with a list of patients they were scheduled to see the following week who the algorithm deemed at high-risk of dying in the next six months. They could review the list and uncheck any patients they thought were not appropriate for end-of-life conversations. For the patients who remained checked, doctors received a text message on the day of the appointment reminding them to discuss the patient’s goals at the end of life. Doctors in the control group did not receive the email or text message intervention.

As we reported in JAMA Oncology, 15% of doctors who received the nudge text had end-of-life conversations with their patients, compared to just 4% of the control doctors….(More)”.

Evaluating Identity Disclosure Risk in Fully Synthetic Health Data: Model Development and Validation


Paper by Khaled El Emam et al: “There has been growing interest in data synthesis for enabling the sharing of data for secondary analysis; however, there is a need for a comprehensive privacy risk model for fully synthetic data: If the generative models have been overfit, then it is possible to identify individuals from synthetic data and learn something new about them.

Objective: The purpose of this study is to develop and apply a methodology for evaluating the identity disclosure risks of fully synthetic data.

Methods: A full risk model is presented, which evaluates both identity disclosure and the ability of an adversary to learn something new if there is a match between a synthetic record and a real person. We term this “meaningful identity disclosure risk.” The model is applied on samples from the Washington State Hospital discharge database (2007) and the Canadian COVID-19 cases database. Both of these datasets were synthesized using a sequential decision tree process commonly used to synthesize health and social science data.

Results: The meaningful identity disclosure risk for both of these synthesized samples was below the commonly used 0.09 risk threshold (0.0198 and 0.0086, respectively), and 4 times and 5 times lower than the risk values for the original datasets, respectively.

Conclusions: We have presented a comprehensive identity disclosure risk model for fully synthetic data. The results for this synthesis method on 2 datasets demonstrate that synthesis can reduce meaningful identity disclosure risks considerably. The risk model can be applied in the future to evaluate the privacy of fully synthetic data….(More)”.

Federal Regulators Increase Focus on Patient Risks From Electronic Health Records


Ben Moscovitch at Pew: “…The Office of the National Coordinator for Health Information Technology (ONC) will collect clinicians’ feedback through a survey developed by the Urban Institute under a contract with the agency. ONC will release aggregated results as part its EHR reporting program. Congress required the program’s creation in the 21st Century Cures Act, the wide-ranging federal health legislation enacted in 2016. The act directs ONC to determine which data to gather from health information technology vendors. That information can then be used to illuminate the strengths and weaknesses of EHR products, as well as industry trends.

The Pew Charitable Trusts, major medical organizations and hospital groups, and health information technology experts have urged that the reporting program examine usability-related patient risks. Confusing, cumbersome, and poorly customized EHR systems can cause health care providers to order the wrong drug or miss test results and other information critical to safe, effective treatment. Usability challenges also can increase providers’ frustration and, in turn, their likelihood of making mistakes.

The data collected from clinicians will shed light on these problems, encourage developers to improve the safety of their products, and help hospitals and doctor’s offices make better-informed decisions about the purchase, implementation, and use of these tools. Research shows that aggregated data about EHRs can generate product-specific insights about safety deficiencies, even when health care facilities implement the same system in distinct ways….(More)”.

Data could hold the key to stopping Alzheimer’s


Blog post by Bill Gates: “My family loves to do jigsaw puzzles. It’s one of our favorite activities to do together, especially when we’re on vacation. There is something so satisfying about everyone working as a team to put down piece after piece until finally the whole thing is done.

In a lot of ways, the fight against Alzheimer’s disease reminds me of doing a puzzle. Your goal is to see the whole picture, so that you can understand the disease well enough to better diagnose and treat it. But in order to see the complete picture, you need to figure out how all of the pieces fit together.

Right now, all over the world, researchers are collecting data about Alzheimer’s disease. Some of these scientists are working on drug trials aimed at finding a way to stop the disease’s progression. Others are studying how our brain works, or how it changes as we age. In each case, they’re learning new things about the disease.

But until recently, Alzheimer’s researchers often had to jump through a lot of hoops to share their data—to see if and how the puzzle pieces fit together. There are a few reasons for this. For one thing, there is a lot of confusion about what information you can and can’t share because of patient privacy. Often there weren’t easily available tools and technologies to facilitate broad data-sharing and access. In addition, pharmaceutical companies invest a lot of money into clinical trials, and often they aren’t eager for their competitors to benefit from that investment, especially when the programs are still ongoing.

Unfortunately, this siloed approach to research data hasn’t yielded great results. We have only made incremental progress in therapeutics since the late 1990s. There’s a lot that we still don’t know about Alzheimer’s, including what part of the brain breaks down first and how or when you should intervene. But I’m hopeful that will change soon thanks in part to the Alzheimer’s Disease Data Initiative, or ADDI….(More)“.

European Health Data Space


European Commission Press Release: “The set-up of the European Health Data Space will be an integral part of building a European Health Union, a process launched by the Commission today with a first set of proposals to reinforce preparedness and response during health crisis. This  is also a direct follow up of the Data strategy adopted by the Commission in February this year, where the Commission had already stressed the importance of creating European data spaces, including on health….

In this perspective, as part of the implementation of the Data strategy, a data governance act is set to be presented still this year, which will support the reuse of public sensitive data such as health data. A dedicated legislative proposal on a European health data space is planned for next year, as set out in the 2021 Commission work programme.

As first steps, the following activities starting in 2021 will pave the way for better data-driven health care in Europe:

  • The Commission proposes a European Health Data Space in 2021;
  • A Joint Action with 22 Member States to propose options on governance, infrastructure, data quality and data solidarity and empowering citizens with regards to secondary health data use in the EU;
  • Investments to support the European Health Data Space under the EU4Health programme, as well as common data spaces and digital health related innovation under Horizon Europe and the Digital Europe programmes;
  • Engagement with relevant actors to develop targeted Codes of Conduct for secondary health data use;
  • A pilot project, to demonstrate the feasibility of cross border analysis for healthcare improvement, regulation and innovation;
  • Other EU funding opportunities for digital transformation of health and care will be available for Member States as of 2021 under Recovery and Resilience Facility, European Regional Development Fund, European Social Fund+, InvestEU.

The set of proposals adopted by the Commission today to strengthen the EU’s crisis preparedness and response, taking the first steps towards a European Health Union, also pave the way for the participation of the European Medicines Agency (EMA) and the European Centre for Disease Prevention and Control (ECDC) in the future European Health Data Space infrastructure, along with research institutes, public health bodies, and data permit authorities in the Member States….(More)”.