Data Disappeared


Essay by Samanth Subramanian: “Whenever President Donald Trump is questioned about why the United States has nearly three times more coronavirus cases than the entire European Union, or why hundreds of Americans are still dying every day, he whips out one standard comment. We find so many cases, he contends, because we test so many people. The remark typifies Trump’s deep distrust of data: his wariness of what it will reveal, and his eagerness to distort it. In April, when he refused to allow coronavirus-stricken passengers off the Grand Princess cruise liner and onto American soil for medical treatment, he explained: “I like the numbers where they are. I don’t need to have the numbers double because of one ship.” Unable—or unwilling—to fix the problem, Trump’s instinct is to fix the numbers instead.

The administration has failed on so many different fronts in its handling of the coronavirus, creating the overall impression of sheer mayhem. But there is a common thread that runs through these government malfunctions. Precise, transparent data is crucial in the fight against a pandemic—yet through a combination of ineptness and active manipulation, the government has depleted and corrupted the key statistics that public health officials rely on to protect us.

In mid-July, just when the U.S. was breaking and rebreaking its own records for daily counts of new coronavirus cases, the Centers for Disease Control and Prevention found itself abruptly relieved of its customary duty of collating national numbers on COVID-19 patients. Instead, the Department of Health and Human Services instructed hospitals to funnel their information to the government via TeleTracking, a small Tennessee firm started by a real estate entrepreneur who has frequently donated to the Republican Party. For a while, past data disappeared from the CDC’s website entirely, and although it reappeared after an outcry, it was never updated thereafter. The TeleTracking system was riddled with errors, and the newest statistics sometimes appeared after delays. This has severely limited the ability of public health officials to determine where new clusters of COVID-19 are blooming, to notice demographic patterns in the spread of the disease, or to allocate ICU beds to those who need them most.

To make matters more confusing still, Jared Kushner moved to start a separate coronavirus surveillance system run out of the White House and built by health technology giants—burdening already-overwhelmed officials and health care experts with a needless stream of queries. Kushner’s assessments often contradicted those of agencies working on the ground. When Andrew Cuomo, New York’s governor, asked for 30,000 ventilators, Kushner claimed the state didn’t need them: “I’m doing my own projections, and I’ve gotten a lot smarter about this.”…(More)”.

Covid-19 Data Is a Mess. We Need a Way to Make Sense of It.


Beth Blauer and Jennifer Nuzzo in the New York Times: “The United States is more than eight months into the pandemic and people are back waiting in long lines to be tested as coronavirus infections surge again. And yet there is still no federal standard to ensure testing results are being uniformly reported. Without uniform results, it is impossible to track cases accurately or respond effectively.

We test to identify coronavirus infections in communities. We can tell if we are casting a wide enough net by looking at test positivity — the percentage of people whose results are positive for the virus. The metric tells us whether we are testing enough or if the transmission of the virus is outpacing our efforts to slow it.

If the percentage of tests coming back positive is low, it gives us more confidence that we are not missing a lot of infections. It can also tell us whether a recent surge in cases may be a result of increased testing, as President Trump has asserted, or that cases are rising faster than the rate at which communities are able to test.

But to interpret these results properly, we need a national standard for how these results are reported publicly by each state. And although the Centers for Disease Control and Prevention issue protocols for how to report new cases and deaths, there is no uniform guideline for states to report testing results, which would tell us about the universe of people tested so we know we are doing enough testing to track the disease. (Even the C.D.C. was found in May to be reporting states’ results in a way that presented a misleading picture of the pandemic.)

Without a standard, states are deciding how to calculate positivity rates on their own — and their approaches are very different.

Some states include results from positive antigen-based tests, some states don’t. Some report the number of people tested, while others report only the number of tests administered, which can skew the overall results when people are tested repeatedly (as, say, at colleges and nursing homes)….(More)”

Crowdfunding during COVID-19: An international comparison of online fundraising


Paper by Greg Elmer, Sabrina Ward-Kimola and Anthony Glyn Burton: “This article performs a digital methods analysis on a sample of online crowdfunding campaigns seeking financial support for COVID related financial challenges. Building upon the crowdfunding literature this paper performs an international comparison of the goals of COVID related campaigns during the early spread of the pandemic. The paper seeks to determine the extent to which crowdfunding campaigns reflect current failures of governments to supress the COVID pandemic and support the financial challenges of families, communities and small businesses….(More)”.

A nudge helps doctors bring up end-of-life issues with their dying cancer patients


Article by Ravi Parikh et al: “When conversations about goals and end-of-life wishes happen early, they can improve patients’ quality of life and decrease their chances of dying on a ventilator or in an intensive care unit. Yet doctors treating cancer focus so much of their attention on treating the disease that these conversations tend to get put off until it’s too late. This leads to costly and often unwanted care for the patient.Related: 

This can be fixed, but it requires addressing two key challenges. The first is that it is often difficult for doctors to know how long patients have left to live. Even among patients in hospice care, doctors get it wrong nearly 70% of the time. Hospitals and private companies have invested millions of dollars to try and identify these outcomes, often using artificial intelligence and machine learning, although most of these algorithms have not been vetted in real-world settings.

In a recent set of studies, our team used data from real-time electronic medical records to develop a machine learning algorithm that identified which cancer patients had a high risk of dying in the next six months. We then tested the algorithm on 25,000 patients who were seen at our health system’s cancer practices and found it performed better than relying only on doctors to identify high-risk patients.

But just because such a tool exists doesn’t mean doctors will use it to prompt more conversations. The second challenge — which is even harder to overcome — is using machine learning to motivate clinicians to have difficult conversations with patients about the end of life.

We wondered if implementing a timely “nudge” that doctors received before seeing their high-risk patients could help them start the conversation.

To test this idea, we used our prediction tool in a clinical trial involving nine cancer practices. Doctors in the nudge group received a weekly report on how many end-of-life conversations they had compared to their peers, along with a list of patients they were scheduled to see the following week who the algorithm deemed at high-risk of dying in the next six months. They could review the list and uncheck any patients they thought were not appropriate for end-of-life conversations. For the patients who remained checked, doctors received a text message on the day of the appointment reminding them to discuss the patient’s goals at the end of life. Doctors in the control group did not receive the email or text message intervention.

As we reported in JAMA Oncology, 15% of doctors who received the nudge text had end-of-life conversations with their patients, compared to just 4% of the control doctors….(More)”.

Evaluating Identity Disclosure Risk in Fully Synthetic Health Data: Model Development and Validation


Paper by Khaled El Emam et al: “There has been growing interest in data synthesis for enabling the sharing of data for secondary analysis; however, there is a need for a comprehensive privacy risk model for fully synthetic data: If the generative models have been overfit, then it is possible to identify individuals from synthetic data and learn something new about them.

Objective: The purpose of this study is to develop and apply a methodology for evaluating the identity disclosure risks of fully synthetic data.

Methods: A full risk model is presented, which evaluates both identity disclosure and the ability of an adversary to learn something new if there is a match between a synthetic record and a real person. We term this “meaningful identity disclosure risk.” The model is applied on samples from the Washington State Hospital discharge database (2007) and the Canadian COVID-19 cases database. Both of these datasets were synthesized using a sequential decision tree process commonly used to synthesize health and social science data.

Results: The meaningful identity disclosure risk for both of these synthesized samples was below the commonly used 0.09 risk threshold (0.0198 and 0.0086, respectively), and 4 times and 5 times lower than the risk values for the original datasets, respectively.

Conclusions: We have presented a comprehensive identity disclosure risk model for fully synthetic data. The results for this synthesis method on 2 datasets demonstrate that synthesis can reduce meaningful identity disclosure risks considerably. The risk model can be applied in the future to evaluate the privacy of fully synthetic data….(More)”.

Federal Regulators Increase Focus on Patient Risks From Electronic Health Records


Ben Moscovitch at Pew: “…The Office of the National Coordinator for Health Information Technology (ONC) will collect clinicians’ feedback through a survey developed by the Urban Institute under a contract with the agency. ONC will release aggregated results as part its EHR reporting program. Congress required the program’s creation in the 21st Century Cures Act, the wide-ranging federal health legislation enacted in 2016. The act directs ONC to determine which data to gather from health information technology vendors. That information can then be used to illuminate the strengths and weaknesses of EHR products, as well as industry trends.

The Pew Charitable Trusts, major medical organizations and hospital groups, and health information technology experts have urged that the reporting program examine usability-related patient risks. Confusing, cumbersome, and poorly customized EHR systems can cause health care providers to order the wrong drug or miss test results and other information critical to safe, effective treatment. Usability challenges also can increase providers’ frustration and, in turn, their likelihood of making mistakes.

The data collected from clinicians will shed light on these problems, encourage developers to improve the safety of their products, and help hospitals and doctor’s offices make better-informed decisions about the purchase, implementation, and use of these tools. Research shows that aggregated data about EHRs can generate product-specific insights about safety deficiencies, even when health care facilities implement the same system in distinct ways….(More)”.

Data could hold the key to stopping Alzheimer’s


Blog post by Bill Gates: “My family loves to do jigsaw puzzles. It’s one of our favorite activities to do together, especially when we’re on vacation. There is something so satisfying about everyone working as a team to put down piece after piece until finally the whole thing is done.

In a lot of ways, the fight against Alzheimer’s disease reminds me of doing a puzzle. Your goal is to see the whole picture, so that you can understand the disease well enough to better diagnose and treat it. But in order to see the complete picture, you need to figure out how all of the pieces fit together.

Right now, all over the world, researchers are collecting data about Alzheimer’s disease. Some of these scientists are working on drug trials aimed at finding a way to stop the disease’s progression. Others are studying how our brain works, or how it changes as we age. In each case, they’re learning new things about the disease.

But until recently, Alzheimer’s researchers often had to jump through a lot of hoops to share their data—to see if and how the puzzle pieces fit together. There are a few reasons for this. For one thing, there is a lot of confusion about what information you can and can’t share because of patient privacy. Often there weren’t easily available tools and technologies to facilitate broad data-sharing and access. In addition, pharmaceutical companies invest a lot of money into clinical trials, and often they aren’t eager for their competitors to benefit from that investment, especially when the programs are still ongoing.

Unfortunately, this siloed approach to research data hasn’t yielded great results. We have only made incremental progress in therapeutics since the late 1990s. There’s a lot that we still don’t know about Alzheimer’s, including what part of the brain breaks down first and how or when you should intervene. But I’m hopeful that will change soon thanks in part to the Alzheimer’s Disease Data Initiative, or ADDI….(More)“.

European Health Data Space


European Commission Press Release: “The set-up of the European Health Data Space will be an integral part of building a European Health Union, a process launched by the Commission today with a first set of proposals to reinforce preparedness and response during health crisis. This  is also a direct follow up of the Data strategy adopted by the Commission in February this year, where the Commission had already stressed the importance of creating European data spaces, including on health….

In this perspective, as part of the implementation of the Data strategy, a data governance act is set to be presented still this year, which will support the reuse of public sensitive data such as health data. A dedicated legislative proposal on a European health data space is planned for next year, as set out in the 2021 Commission work programme.

As first steps, the following activities starting in 2021 will pave the way for better data-driven health care in Europe:

  • The Commission proposes a European Health Data Space in 2021;
  • A Joint Action with 22 Member States to propose options on governance, infrastructure, data quality and data solidarity and empowering citizens with regards to secondary health data use in the EU;
  • Investments to support the European Health Data Space under the EU4Health programme, as well as common data spaces and digital health related innovation under Horizon Europe and the Digital Europe programmes;
  • Engagement with relevant actors to develop targeted Codes of Conduct for secondary health data use;
  • A pilot project, to demonstrate the feasibility of cross border analysis for healthcare improvement, regulation and innovation;
  • Other EU funding opportunities for digital transformation of health and care will be available for Member States as of 2021 under Recovery and Resilience Facility, European Regional Development Fund, European Social Fund+, InvestEU.

The set of proposals adopted by the Commission today to strengthen the EU’s crisis preparedness and response, taking the first steps towards a European Health Union, also pave the way for the participation of the European Medicines Agency (EMA) and the European Centre for Disease Prevention and Control (ECDC) in the future European Health Data Space infrastructure, along with research institutes, public health bodies, and data permit authorities in the Member States….(More)”.

The web is full of junk health info. This startup wants to change that


Daphne Leprince-Ringuet at ZDNet: “A crowdsourcing platform aims to provide better insight into health issues than is currently available….In the age of social media, blogs, and online forums, the most common practice when feeling slightly under the weather has undeniably become to resort to a quick Google search. Unfortunately, when they are not unnecessarily worrying, the answers found on the web are typically inconclusive. That observation is what prompted Israeli entrepreneur Yael Elish to launch StuffThatWorks, an AI-based online platform that collects crowdsourced data about a host of chronic conditions.

The idea being that, unlike Facebook groups or Reddit threads, the information shared by patients is centralized and assessed for quality to readily provide informed data to other users who are enquiring about their own symptoms. Healthline cuts through the confusion with straightforward, expert-reviewed, person-first experiences — all designed to help you make the best decisions. Elish is a former member of the founding team for crowdsourced navigation app Waze, but this time instead of tapping user-generated content to come up with traffic predictions and accident warnings, StuffThatWorks is intended to give users better insights into illness…(More)”.

Tackling misinformation during crisis


Paper by Elizabether Seger and Mark Briers: “The current COVID-19 pandemic and the accompanying ‘infodemic’ clearly illustrate that access to reliable information is crucial to coordinating a timely crisis response in democratic societies. Inaccurate information and the muzzling of important information sources have degraded trust in health authorities and slowed public response to the crisis. Misinformation about ineffective cures, the origins and malicious spread of COVID-19, unverified treatment discoveries, and the efficacy of face coverings have increased the difficulty of coordinating a unified public response during the crisis. 

In a recent report, researchers at the Cambridge Centre for the Study of Existential Risk (CSER) in collaboration with The Alan Turing Institute and the Defence Science and Technology Laboratory (Dstl) workshopped an array of hypothetical crisis scenarios to investigate social and technological factors that interfere with well-informed decision-making and timely collective action in democratic societies.

Crisis scenarios

Crisis scenarios are useful tools for appraising threats and vulnerabilities to systems of information production, dissemination, and evaluation. Factors influencing how robust a society is to such threats and vulnerabilities are not always obvious when life is relatively tranquil but are often highlighted under the stress of a crisis. 

CSER and Dstl workshop organisers, together with workshop participants (a diverse group of professionals interested in topics related to [mis/dis]information, information technology, and crisis response), co-developed and explored six hypothetical crisis scenarios and complex challenges:

  • Global health crisis
  • Character assassination
  • State fake news campaign
  • Economic collapse
  • Xenophobic ethnic cleansing
  • Epistemic babble, where the ability for the general population to tell the difference between truth and fiction (presented as truth) is lost

We analysed each scenario to identify various interest groups and actors, to pinpoint vulnerabilities in systems of information production and exchange, and to visualise how the system might be interfered with. We also considered interventions that could help bolster the society against threats to informed decision-making.

The systems map below is an example from workshop scenario 1: Global health crisis. The map shows how adversarial actors (red) and groups working to mitigate the crisis (blue) interact, impact each other’s actions, and influence the general public and other interest groups (green) such as those affected by the health crisis. 

Systems maps help visualise vulnerabilities in both red and blue actor systems, which, in turn, helps identify areas where intervention (yellow) is possible to help mitigate the crisis….(More)