The lapses in India’s Covid-19 data are a result of decades of callousness towards statistics


Prathamesh Mulye at Quartz: “India is paying a huge price for decades of callous attitude towards data and statistics. For several weeks now, experts have been calling out the Indian government and state heads for suppressing Covid-19 infection and death figures. None of the political leaders have addressed these concerns even as official data reflects a small fraction of what’s playing out at hospitals and cremation grounds.

A major reason why administrations are getting away without an answer is that data lapses are nothing new to India.

Successive regimes in the country have tinkered and twisted figures as per their convenience without much consequences. For years, the country has been criticised for insufficient and poor quality data relating to a range of topics, including GDP, farmer suicide, and even unemployment…

Before the pandemic started, the most prominent data controversy in India was around the GDP numbers, which the Modi government continuously changed and chopped to cover up the slowdown in economic growth. In 2019, the Modi government also chose not to publish an unemployment data report that showed that joblessness in the country was at a nine-year high in 2017-18. And last year, in the middle of the pandemic, the government said it had no data on the number of frontline workers who had lost their lives to Covid-19 or a list of police personnel fatalities due to the disease.

Experts say that India’s statistical machinery has been deliberately weakened over the past few years to protect various governments’ false claims and image.

“The weakened statistical machinery manifests itself in different ways such as delays and questions about data quality. Also, when the results of a survey don’t suit the government in power, it tries to suppress data. This happened, for instance, with nutrition data in previous governments too,” said Reetika Khera, associate professor at the Indian Institute of Technology (IIT), Delhi.

“Think of the economy as a patient: data captures its pulse rate. If you don’t listen to the pulse, you won’t be able to diagnose correctly, let alone cure it,” she added….(More)”

Digital Inclusion is a Social Determinant of Health


Paper by Jill Castek et al: “Efforts to improve digital literacies and internet access are valuable tools to reduce health disparities. The costs of equipping a person to use the internet are substantially lower than treating health conditions, and the benefits are multiple….

Those who do not have access to affordable broadband internet services, digital devices, digital literacies training, and technical support, face numerous challenges video-conferencing with their doctor,  checking test results, filling prescriptions, and much more.  Many individuals require significant support developing the digital literacies needed to engage in telehealth with the greatest need among older individuals, racial/ethnic minorities, and low-income communities. Taken in context, the costs of equipping a person to use the internet are substantially lower than treating health conditions, and the benefits are both persistent and significant.2 

“Super” Social Determinants of Health

Digital literacies and internet connectivity have been called the “super social determinants of health” because they encompass all other social determinants of health (SDOH).  Access to information, supports, and services are increasingly, and sometimes exclusively, accessible only online.

The social determinants of health shown in Figure 1. Digital Literacies & Access, include the neighborhood and physical environment, economic sustainability, healthcare system, community and social context, food, and education.4  Together these factors impact an individual’s ability to access healthcare services, education, housing, transportation, online banking, and sustain relationships with family members and friends.  Digital literacies and access impacts all facets of a person’s life and affects behavioral and environmental outcomes such as shopping choices, housing, support systems, and health coverage….(More)”

Figure 1. Digital Literacies & Access. 

How medical AI devices are evaluated: limitations and recommendations from an analysis of FDA approvals


Paper by Eric Wu et al: “Medical artificial-intelligence (AI) algorithms are being increasingly proposed for the assessment and care of patients. Although the academic community has started to develop reporting guidelines for AI clinical trials, there are no established best practices for evaluating commercially available algorithms to ensure their reliability and safety. The path to safe and robust clinical AI requires that important regulatory questions be addressed. Are medical devices able to demonstrate performance that can be generalized to the entire intended population? Are commonly faced shortcomings of AI (overfitting to training data, vulnerability to data shifts, and bias against underrepresented patient subgroups) adequately quantified and addressed?

In the USA, the US Food and Drug Administration (FDA) is responsible for approving commercially marketed medical AI devices. The FDA releases publicly available information on approved devices in the form of a summary document that generally contains information about the device description, indications for use, and performance data of the device’s evaluation study. The FDA has recently called for improvement of test-data quality, improvement of trust and transparency with users, monitoring of algorithmic performance and bias on the intended population, and testing with clinicians in the loop. To understand the extent to which these concerns are addressed in practice, we have created an annotated database of FDA-approved medical AI devices and systematically analyzed how these devices were evaluated before approval. Additionally, we have conducted a case study of pneumothorax-triage devices and found that evaluating deep-learning models at a single site alone, which is often done, can mask weaknesses in the models and lead to worse performance across sites.

Fig. 1: Breakdown of 130 FDA-approved medical AI devices by body area.

figure1

Devices are categorized by risk level (square, high risk; circle, low risk). Blue indicates that a multi-site evaluation was reported; otherwise, symbols are gray. Red outline indicates a prospective study (key, right margin). Numbers in key indicate the number of devices with each characteristic….(More)”.

The Use of Mobility Data for Responding to the COVID-19 Pandemic


New Report, Repository and set of Case Studies commissioned by the Open Data Institute: “…The GovLab and Cuebiq firstly assembled a repository of mobility data collaboratives related to Covid-19. They then selected five of these to analyse further, and produced case studies on each of the collaboratives (which you can find below in the ‘Key outputs’ section).

After analysing these initiatives, Cuebiq and The GovLab then developed a synthesis report, which contains sections focused on:

  • Mobility data – what it is and how it can be used
  • Current practice – insights from five case studies
  • Prescriptive analysis – recommendations for the future

Findings and recommendations

Based on this analysis, the authors of the report recommend nine actions which have the potential to enable more effective, sustainable and responsible re-use of mobility data through data collaboration to support decision making regarding pandemic prevention, monitoring, and response:

  1. Developing and clarifying governance framework to enable the trusted, transparent, and accountable reuse of privately held data in the public interest under a clear regulatory framework
  2. Building capacity of organisations in the public and private sector to reuse and act on data through investments in training, education, and reskilling of relevant authorities; especially driving support for institutions in the Global South
  3. Establishing data stewards in organisations who can coordinate and collaborate with counterparts on using data in the public’s interest and acting on it.
  4. Establishing dedicated and sustainable CSR (Corporate Social Responsibility) programs on data in organisations to coordinate and collaborate with counterparts on using and acting upon data in the public’s interest.
  5. Building a network of data stewards to coordinate and streamline efforts while promoting greater transparency; as well as exchange best practices and lessons learned.
  6. Engaging citizens about how their data is being used so clearly articulate how they want their data to be responsibly used, shared, and protected.
  7. Promoting technological innovation through collaboration between funders (eg governments and foundations) and researchers (eg data scientists) to develop and deploy useful, privacy-preserving technologies.
  8. Unlocking funds from a variety of sources to ensure projects are sustainable and can operate long term.
  9. Increase research and spur evidence gathering by publishing easily accessible research and creating dedicated centres to develop best practices.

This research begins to demonstrate the value that a handful of new data-sharing initiatives have had in the ongoing response to Covid-19. The pandemic isn’t yet over, and we will need to continue to assess and evaluate how data has been shared – both successfully and unsuccessfully – and who has benefited or been harmed in the process. More research is needed to highlight the lessons from this emergency that can be applied to future crises….(More)”.

A Victory for Scientific Pragmatism


Essay by Arturo CasadevallMichael J. Joynerand Nigel Paneth:”…The convalescent plasma controversy highlights the need to better educate physicians on the knowledge problem in medicine: How do we know what we know, and how do we acquire new knowledge? The usual practice guidelines doctors rely on for the treatment of disease were not available for the treatment of Covid-19 early in the pandemic, since these are usually issued by professional societies only after definitive information is available from RCTs, a luxury we did not have. The convalescent plasma experience supports Devorah Goldman’s plea to consider all available information when making therapeutic decisions.

Fortunately, the availability of rapid communication through pre-print studies, social media, and online conferences have allowed physicians to learn quickly. The experience suggests the value of providing more instruction in medical schools, postgraduate education, and continuing medical education on how best to evaluate evidence — especially preliminary and seemingly contradictory evidence. Just as physicians learn to use clinical judgment in treating individual patients, they must learn how to weigh evidence in treating populations of patients. We also need greater nimbleness and more flexibility from regulators and practice-guideline groups in emergency situations such as pandemics. They should issue interim recommendations that synthesize the best available evidence, as the American Association of Blood Bankers has done for plasma, recognizing that these recommendations may change as new evidence accumulates. Similarly, we all need to make greater efforts to educate the public to understand that all knowledge in medicine and science is provisional, subject to change as new and better studies emerge. Updating and revising recommendations as knowledge advances is not a weakness but a foundational strength of good medicine….(More)”.

Open data in action: initiatives during the initial stage of the COVID-19 pandemic


Report by OECD and The GovLab: “The COVID-19 pandemic has increased the demand for access to timely, relevant, and quality data. This demand has been driven by several needs: taking informed policy actions quickly, improving communication on the current state of play, carrying out scientific analysis of a dynamic threat, understanding its social and economic impact, and enabling civil society oversight and reporting.


This report…assesses how open government data (OGD) was used to react and respond to the COVID-19 pandemic during initial stage of the crisis (March-July 2020) based on initiatives collected through an open call for evidence. It also seeks to transform lessons learned into considerations for policy makers on how to improve OGD policies to better prepare for future shocks…(More)”.

Hospitals Hide Pricing Data From Search Results


Tom McGintyAnna Wilde Mathews and Melanie Evans at the Wall Street Journal: “Hospitals that have published their previously confidential prices to comply with a new federal rule have also blocked that information from web searches with special coding embedded on their websites, according to a Wall Street Journal examination.

The information must be disclosed under a federal rule aimed at making the $1 trillion sector more consumer friendly. But hundreds of hospitals embedded code in their websites that prevented Alphabet Inc.’s Google and other search engines from displaying pages with the price lists, according to the Journal examination of more than 3,100 sites.

The code keeps pages from appearing in searches, such as those related to a hospital’s name and prices, computer-science experts said. The prices are often accessible other ways, such as through links that can require clicking through multiple layers of pages.

“It’s technically there, but good luck finding it,” said Chirag Shah, an associate professor at the University of Washington who studies human interactions with computers. “It’s one thing not to optimize your site for searchability, it’s another thing to tag it so it can’t be searched. It’s a clear indication of intentionality.”…(More)”.

An early warning approach to monitor COVID-19 activity with multiple digital traces in near real time


Paper by Nicole E. Kogan et al: “We propose that several digital data sources may provide earlier indication of epidemic spread than traditional COVID-19 metrics such as confirmed cases or deaths. Six such sources are examined here: (i) Google Trends patterns for a suite of COVID-19–related terms; (ii) COVID-19–related Twitter activity; (iii) COVID-19–related clinician searches from UpToDate; (iv) predictions by the global epidemic and mobility model (GLEAM), a state-of-the-art metapopulation mechanistic model; (v) anonymized and aggregated human mobility data from smartphones; and (vi) Kinsa smart thermometer measurements.

We first evaluate each of these “proxies” of COVID-19 activity for their lead or lag relative to traditional measures of COVID-19 activity: confirmed cases, deaths attributed, and ILI. We then propose the use of a metric combining these data sources into a multiproxy estimate of the probability of an impending COVID-19 outbreak. Last, we develop probabilistic estimates of when such a COVID-19 outbreak will occur on the basis of multiproxy variability. These outbreak-timing predictions are made for two separate time periods: the first, a “training” period, from 1 March to 31 May 2020, and the second, a “validation” period, from 1 June to 30 September 2020. Consistent predictive behavior among proxies in both of these subsequent and nonoverlapping time periods would increase the confidence that they may capture future changes in the trajectory of COVID-19 activity….(More)”.

The Ethics and Laws of Medical Big Data


Chapter by Hrefna Gunnarsdottir et al: “The COVID-19 pandemic has highlighted that leveraging medical big data can help to better predict and control outbreaks from the outset. However, there are still challenges to overcome in the 21st century to efficiently use medical big data, promote innovation and public health activities and, at the same time, adequately protect individuals’ privacy. The metaphor that property is a “bundle of sticks”, each representing a different right, applies equally to medical big data. Understanding medical big data in this way raises a number of questions, including: Who has the right to make money off its buying and selling, or is it inalienable? When does medical big data become sufficiently stripped of identifiers that the rights of an individual concerning the data disappear? How have different regimes such as the General Data Protection Regulation in Europe and the Health Insurance Portability and Accountability Act in the US answered these questions differently? In this chapter, we will discuss three topics: (1) privacy and data sharing, (2) informed consent, and (3) ownership. We will identify and examine ethical and legal challenges and make suggestions on how to address them. In our discussion of each of the topics, we will also give examples related to the use of medical big data during the COVID-19 pandemic, though the issues we raise extend far beyond it….(More)”.

Policy 2.0 in the Pandemic World: What Worked, What Didn’t, and Why


Blog by David Osimo: “…So how, then, did these new tools perform when confronted with the once-in-a-lifetime crisis of a vast global pandemic?

It turns out, some things worked. Others didn’t. And the question of how these new policymaking tools functioned in the heat of battle is already generating valuable ammunition for future crises.

So what worked?

Policy modelling – an analytical framework designed to anticipate the impact of decisions by simulating the interaction of multiple agents in a system rather than just the independent actions of atomised and rational humans – took centre stage in the pandemic and emerged with reinforced importance in policymaking. Notably, it helped governments predict how and when to introduce lockdowns or open up. But even there uptake was limited. A recent survey showed that of the 28 models used in different countries to fight the pandemic were traditional, and not the modern “agent-based models” or “system dynamics” supposed to deal best with uncertainty. Meanwhile, the concepts of system science was becoming prominent and widely communicated. It became quickly clear in the course of the crisis that social distancing was more a method to reduce the systemic pressure on the health services than a way to avoid individual contagion (the so called “flatten the curve” project).

Open government data has long promised to allow citizens and businesses to build new services at scale and make government accountable. The pandemic largely confirmed how important this data could be to allow citizens to analyse things independently. Hundreds of analysts from all walks of life and disciplines used social media to discuss their analysis and predictions, many becoming household names and go-to people in countries and regions. Yes, this led to noise and a so-called “infodemic,” but overall it served as a fundamental tool to increase confidence and consensus behind the policy measures and to make governments accountable for their actions. For instance, one Catalan analyst demonstrated that vaccines were not provided during weekends and forced the government to change its stance. Yet it is also clear that not all went well, most notably on the supply side. Governments published data of low quality, either in PDF, with delays or with missing data due to spreadsheet abuse.

In most cases, there was little demand for sophisticated data publishing solutions such as “linked” or “FAIR” data, although particularly significant was the uptake of these kinds of solutions when it came time to share crucial research data. Experts argue that the trend towards open science has accelerated dramatically and irreversibly in the last year, as shown by the portal https://www.covid19dataportal.org/ which allowed sharing of high quality data for scientific research….

But other new policy tools proved less easy to use and ultimately ineffective. Collaborative governance, for one, promised to leverage the knowledge of thousands of citizens to improve public policies and services. In practice, methodologies aiming at involving citizens in decision making and service design were of little use. Decisions related to lockdown and opening up were taken in closed committees in top down mode. Individual exceptions certainly exist: Milan, one of the cities worst hit by the pandemic, launched a co-created strategy for opening up after the lockdown, receiving almost 3000 contributions to the consultation. But overall, such initiatives had limited impact and visibility. With regard to co-design of public services, in times of emergency there was no time for prototyping or focus groups. Services such as emergency financial relief had to be launched in a hurry and “just work.”

Citizen science promised to make every citizen a consensual data source for monitoring complex phenomena in real time through apps and Internet-of-Things sensors. In the pandemic, there were initially great expectations on digital contact tracing apps to allow for real time monitoring of contagions, most notably through bluetooth connections in the phone. However, they were mostly a disappointment. Citizens were reluctant to install them. And contact tracing soon appeared to be much more complicated – and human intensive – than originally thought. The huge debate between technology and privacy was followed by very limited impact. Much ado about nothing.

Behavioural economics (commonly known as nudge theory) is probably the most visible failure of the pandemic. It promised to move beyond traditional carrots (public funding) and sticks (regulation) in delivering policy objectives by adopting an experimental method to influence or “nudge” human behaviour towards desired outcomes. The reality is that soft nudges proved an ineffective alternative to hard lockdown choices. What makes it uniquely negative is that such methods took centre stage in the initial phase of the pandemic and particularly informed the United Kingdom’s lax approach in the first months on the basis of a hypothetical and unproven “behavioural fatigue.” This attracted heavy criticism towards the excessive reliance on nudges by the United Kingdom government, a legacy of Prime Minister David Cameron’s administration. The origin of such criticisms seems to lie not in the method shortcomings per se, which enjoyed success previously on more specific cases, but in the backlash from excessive expectations and promises, epitomised in the quote of a prominent behavioural economist: “It’s no longer a matter of supposition as it was in 2010 […] we can now say with a high degree of confidence these models give you best policy.

Three factors emerge as the key determinants behind success and failure: maturity, institutions and leadership….(More)”.