A widening data divide: COVID-19 and the Global South


Essay by stefania milan and Emiliano Treré at Data & Policy: “If numbers are the conditions of existence of the COVID-19 problem, we ought to pay attention to the actual (in)ability of many countries in the South to test their population for the virus, and to produce reliable population statistics more in general — let alone to adequately care for them. It is a matter of a “data gap” as well as of data quality, which even in “normal” times hinders the need for “evidence-based policy making, tracking progress and development, and increasing government accountability” (Chen et al., 2013). And while the World Health Organization issues warning about the “dramatic situation” concerning the spread of COVID-19 in the African continent, to name just one of the blind spots of our datasets of the global pandemic, the World Economic Forum calls for “flattening the curve” in developing countries. Progress has been made following the revision of the United Nations’ Millennium Development Goals in 2005, with countries in the Global South have been invited (and supported) to devise National Strategies for the Development of Statistics. Yet, a cursory look at the NYU GovLab’s valuable repository of data collaboratives” addressing the COVID-19 pandemic reveals the virtual absence of data collection and monitoring projects in the South of the hemisphere. The next obvious step is the dangerous equation “no data=no problem”.

Disease and “whiteness”

Epidemiology and pharmacogenetics (i.e. the study of the genetic basis of how people respond to pharmaceuticals), to name but a few amongst the number of concerned life sciences, are largely based on the “inclusion of white/Caucasians in studies and the exclusion of other ethnic groups” (Tutton, 2007). In other words, modeling of disease evolution and the related solutions are based on datasets that take into account primarily — and in fact almost exclusively — the caucasian population. This is a known problem in the field, which derives from the “assumption that a Black person could be thought of as being White”, dismissing specificities and differences. This problem has been linked to the “lack of social theory development, due mainly to the reluctance of epidemiologists to think about social mechanisms (e.g., racial exploitation)” (Muntaner, 1999, p. 121). While COVID-19 represents a slight variation on this trend, having been first identified in China, the problem on the large scale remains. And in times of a health emergency as global as this one, risks to be reinforced and perpetuated.

A succulent market for the industry

In the lack of national testing capacity, the developing world might fall prey to the blooming industry of genetic and disease testing, on the one hand, and of telecom-enabled population monitoring on the other. Private companies might be able to fill the gap left by the state, mapping populations at risk — while however monetizing their data. The case of 23andme is symptomatic of this rise of industry-led testing, which constitutes a double-edge sword. On the one hand, private actors might supply key services that resource-poor or failing states are unable to provide. On the other hand, however, the distorted and often hidden agendas of profit-led players reveals its shortcomings and dangers. If we look at the telecom industry, we note how it has contributed to track disease propagation in a number of health emergencies such as Ebola. And if the global open data community has called for smoother data exchange between the private and the public sector to collectively address the spread of the virus,in the absence of adequate regulatory frameworks in the Global South, for example in the field of privacy and data retention, local authorities might fall prey to outside interventions of dubious nature….(More)”.

Data & Policy


Data & Policy, an open-access journal exploring the potential of data science for governance and public decision-making, published its first cluster of peer-reviewed articles last week.

The articles include three contributions specifically concerned with data protection by design:

·       Gefion Theurmer and colleagues (University of Southampton) distinguish between data trusts and other data sharing mechanisms and discuss the need for workflows with data protection at their core;

·       Swee Leng Harris (King’s College London) explores Data Protection Impact Assessments as a framework for helping us know whether government use of data is legal, transparent and upholds human rights;

·       Giorgia Bincoletto’s (University of Bologna) study investigates data protection concerns arising from cross-border interoperability of Electronic Health Record systems in the European Union;

Also published, research by Jacqueline Lam and colleagues (University of Cambridge; Hong Kong University) on how fine-grained data from satellites and other sources can help us understand environmental inequality and socio-economic disparities in China, and this also reflects upon the importance of safeguarding data privacy and security. See also the blogs this week on the potential of Data Collaboratives for COVID-19 by Editor-in-Chief Stefaan Verhulst (the GovLab) and how COVID-19 exposes a widening data divide for the Global South, by Stefania Milan (University of Amsterdam) and Emiliano Treré (University of Cardiff).

Data & Policy is an open access, peer-reviewed venue for contributions that consider how systems of policy and data relate to one another. Read the 5 ways you can contribute to Data & Policy and contact dataandpolicy@cambridge.org with any questions….(More)”.

Citizen input matters in the fight against COVID-19


Britt Lake at FeedbackLabs: “When the Ebola crisis hit West Africa in 2015, one of the first responses was to build large field hospitals to treat the rapidly growing number of Ebola patients. As Paul Richards explains, “These were seen as the safest option. But they were shunned by families, because so few patients came out alive.” Aid workers vocally opposed local customs like burial rituals that contributed to the spread of the virus, which caused tension with communities. Ebola-affected communities insisted that some of their methods had proven effective in lowering case numbers before outside help arrived. When government and aid agencies came in and delivered their own messages, locals felt that their expertise had been ignored. Distrust spread, as did a sense that the response pitted local knowledge against global experts. And the virus continued to spread. 

The same is true now. Today there are more than 1 million confirmed cases of COVID-19 worldwide. The virus has spread to every country and territory in the world, leaving virtually no one unaffected. The pandemic is exacerbating inequities in employment, education, access to healthcare and food, and workers’ rights even as it raises new challenges. Everyone is looking for answers to address their needs and anxieties while also collectively realizing that this pandemic and our responses to it will irrevocably shape the future.

It would be easy for us in the public sector to turn inwards for solutions on how to respond effectively to the pandemic and its aftermath. It’s comfortable to focus on perspectives from our own teams when we feel a heightened sense of urgency, and decisions must be made on a dime. However, it would be a mistake not to consider input from the communities we serve – alongside expert knowledge – when determining how we support them through this crisis. 

COVID-19 affects everyone on earth, and it won’t be possible to craft equitable responses that meet people’s needs around the globe unless we listen to what would work best to address those challenges and support homegrown solutions that are already working. Effective communication of public health information, for instance, is central to controlling the spread of COVID-19. By listening to communities, we can better understand what communication methods work for them and can do a better job getting those messages across in a way that resonates with diverse communities. And to face the looming economic crisis that COVID-19 is precipitating, we will need to engage in real dialogue with people about their priorities and the way they want to see society rebuilt….(More)”.

Synthetic data offers advanced privacy for the Census Bureau, business


Kate Kaye at IAPP: “In the early 2000s, internet accessibility made risks of exposing individuals from population demographic data more likely than ever. So, the U.S. Census Bureau turned to an emerging privacy approach: synthetic data.

Some argue the algorithmic techniques used to develop privacy-secure synthetic datasets go beyond traditional deidentification methods. Today, along with the Census Bureau, clinical researchers, autonomous vehicle system developers and banks use these fake datasets that mimic statistically valid data.

In many cases, synthetic data is built from existing data by filtering it through machine learning models. Real data representing real individuals flows in, and fake data mimicking individuals with corresponding characteristics flows out.

When data scientists at the Census Bureau began exploring synthetic data methods, adoption of the internet had made deidentified, open-source data on U.S. residents, their households and businesses more accessible than in the past.

Especially concerning, census-block-level information was now widely available. Because in rural areas, a census block could represent data associated with as few as one house, simply stripping names, addresses and phone numbers from that information might not be enough to prevent exposure of individuals.

“There was pretty widespread angst” among statisticians, said John Abowd, the bureau’s associate director for research and methodology and chief scientist. The hand-wringing led to a “gradual awakening” that prompted the agency to begin developing synthetic data methods, he said.

Synthetic data built from the real data preserves privacy while providing information that is still relevant for research purposes, Abowd said: “The basic idea is to try to get a model that accurately produces an image of the confidential data.”

The plan for the 2020 census is to produce a synthetic image of that original data. The bureau also produces On the Map, a web-based mapping and reporting application that provides synthetic data showing where workers are employed and where they live along with reports on age, earnings, industry distributions, race, ethnicity, educational attainment and sex.

Of course, the real census data is still locked away, too, Abowd said: “We have a copy and the national archives have a copy of the confidential microdata.”…(More)”.

The potential of Data Collaboratives for COVID19


Blog post by Stefaan Verhulst: “We live in almost unimaginable times. The spread of COVID-19 is a human tragedy and global crisis that will impact our communities for many years to come. The social and economic costs are huge and mounting, and they are already contributing to a global slowdown. Every day, the emerging pandemic reveals new vulnerabilities in various aspects of our economic, political and social lives. These include our vastly overstretched public health services, our dysfunctional political climate, and our fragile global supply chains and financial markets.

The unfolding crisis is also making shortcomings clear in another area: the way we re-use data responsibly. Although this aspect of the crisis has been less remarked upon than other, more obvious failures, those who work with data—and who have seen its potential to impact the public good—understand that we have failed to create the necessary governance and institutional structures that would allow us to harness data responsibly to halt or at least limit this pandemic. A recent article in Stat, an online journal dedicated to health news, characterized the COVID-19 outbreak as “a once-in-a-century evidence fiasco.” The article continues: 

“At a time when everyone needs better information, […] we lack reliable evidence on how many people have been infected with SARS-CoV-2 or who continue to become infected. Better information is needed to guide decisions and actions of monumental significance and to monitor their impact.” 

It doesn’t have to be this way, and these data challenges are not an excuse for inaction. As we explain in what follows, there is ample evidence that the re-use of data can help mitigate health pandemics. A robust (if somewhat unsystematized) body of knowledge could direct policymakers and others in their efforts. In the second part of this article, we outline eight steps that key stakeholders can and should take to better re-use data in the fight against COVID-19. In particular, we argue that more responsible data stewardship and increased use of data collaboratives are critical….(More)”. 

Unpredictable Residency during the COVID-19 Pandemic Spells Trouble for the 2020 Census Count


Blog by Diana Elliott and Robert Santos: “Social distancing measures to curtail the community spread of COVID-19 have upended daily life. Just before lockdowns were implemented across the country, there was tremendous movement and migration of people relocating to different residences to shelter in place. This makes sense for the people involved but could be disastrous for the communities they fled and the final 2020 Census counts.

Pandemic-based migration undermines an accurate count

The 2020 Census, like most data collected by the US Census Bureau, is residence based. In the years leading up to 2020, the US Census Bureau worked diligently on the quality of the Master Address File, or the catalog of all residential addresses in the country. Staff account for newly built housing developments and buildings, apartment units or accessory dwelling units that are used as permanent residences, and the demolition of homes and apartments in the past decade. Census materials are sent to an address, rather than a person.

Most residences across America have already received their 2020 Census invitation. Whether completed online, by paper, by phone, or in person, the first official question on the 2020 Census questionnaire is “How many people were living or staying in this house, apartment, or mobile home on April 1, 2020?” Households are expected to answer this based on the concept of “usual residence,” or the place where a person lives and sleeps most of the time.

Despite written guidance provided on the 2020 Census on how to answer this question, doing so may be wrought with complexities and nuance from the pandemic.

First, research reveals that respondents do not often read questionnaire instructions; they dive in and start answering. With many people scrambling to other counties, cities, and states to hunker down for the long haul with loved ones, this will lead to incorrect counts when people are counted at temporary addresses.

Second, for many, the concept of “usual residence” has little relevance in the uncertainty unfolding during the COVID-19 pandemic. What if your temporary address becomes your permanent address? What does “usual residence” mean during a global epidemic that could stretch for 18 months or more? And perhaps more importantly, what should it mean?

Finally, there is the added complication of census operational delays (PDF). Self-response to the 2020 Census has been extended into August, as have the nonresponse follow-up efforts, when enumerators knock on the doors of those who haven’t yet answered the census. Additional delays seem unavoidable. The longer the delay, the more time there is for people who have not yet completed a census form to realize their temporary plan has evolved into a state of permanence….(More)”.

Location Surveillance to Counter COVID-19: Efficacy Is What Matters


Susan Landau at Lawfare: “…Some government officials believe that the location information that phones can provide will be useful in the current crisis. After all, if cellphone location information can be used to track terrorists and discover who robbed a bank, perhaps it can be used to determine whether you rubbed shoulders yesterday with someone who today was diagnosed as having COVID-19, the respiratory disease that the novel coronavirus causes. But such thinking ignores the reality of how phone-tracking technology works.

Let’s look at the details of what we can glean from cellphone location information. Cell towers track which phones are in their locale—but that is a very rough measure, useful perhaps for tracking bank robbers, but not for the six-foot proximity one wants in order to determine who might have been infected by the coronavirus.

Finer precision comes from GPS signals, but these can only work outside. That means the location information supplied by your phone—if your phone and that of another person are both on—can tell you if you both went into the same subway stop around the same time. But it won’t tell you whether you rode the same subway car. And the location information from your phone isn’t fully precise. So not only can’t it reveal if, for example, you were in the same aisle in the supermarket as the ill person, but sometimes it will make errors about whether you made it into the store, as opposed to just sitting on a bench outside. What’s more, many people won’t have the location information available because GPS drains the battery, so they’ll shut it off when they’re not using it. Their phones don’t have the location information—and neither do the providers, at least not at the granularity to determine coronavirus exposure.

GPS is not the only way that cellphones can collect location information. Various other ways exist, including through the WiFi network to which a phone is connected. But while two individuals using the same WiFi network are likely to be close together inside a building, the WiFi data would typically not be able to determine whether they were in that important six-foot proximity range.

Other devices can also get within that range, including Bluetooth beacons. These are used within stores, seeking to determine precisely what people are—and aren’t—buying; they track peoples’ locations indoors within inches. But like WiFi, they’re not ubiquitous, so their ability to track exposure will be limited.

If the apps lead to the government’s dogging people’s whereabouts at work, school, in the supermarket and at church, will people still be willing to download the tracking apps that get them get discounts when they’re passing the beer aisle? China follows this kind of surveillance model, but such a surveillance-state solution is highly unlikely to be acceptable in the United States. Yet anything less is unlikely to pinpoint individuals exposed to the virus.

South Korea took a different route. In precisely tracking coronavirus exposure, the country used additional digital records, including documentation of medical and pharmacy visits, history of credit card transactions, and CCTV videos, to determine where potentially exposed people had been—then followed up with interviews not just of infected people but also of their acquaintances, to determine where they had traveled.

Validating such records is labor intensive. And for the United States, it may not be the best use of resources at this time. There’s an even more critical reason that the Korean solution won’t work for the U.S.: South Korea was able to test exposed people. The U.S. can’t do this. Currently the country has a critical shortage of test kits; patients who are not sufficiently ill as to be hospitalized are not being tested. The shortage of test kits is sufficiently acute that in New York City, the current epicenter of the pandemic, the rule is, “unless you are hospitalized and a diagnosis will impact your care, you will not be tested.” With this in mind, moving to the South Korean model of tracking potentially exposed individuals won’t change the advice from federal and state governments that everyone should engage in social distancing—but employing such tracking would divert government resources and thus be counterproductive.

Currently, phone tracking in the United States is not efficacious. It cannot be unless all people are required to carry such location-tracking devices at all times; have location tracking on; and other forms of information tracking, including much wider use of CCTV cameras, Bluetooth beacons, and the like, are also in use. There are societies like this. But so far, even in the current crisis, no one is seriously contemplating the U.S. heading in that direction….(More)”.

Why resilience to online disinformation varies between countries


Edda Humprecht at the Democratic Audit: “The massive spread of online disinformation, understood as content intentionally produced to mislead others, has been widely discussed in the context of the UK Brexit referendum and the US general election in 2016. However, in many other countries online disinformation seems to be less prevalent. It seems certain countries are better equipped to face the problems of the digital era, demonstrating a resilience to manipulation attempts. In other words, citizens in these countries are better able to adapt to overcome challenges such as the massive spread of online disinformation and their exposure to it. So, do structural conditions render countries more or less resilient towards online disinformation?

As a first step to answering this question, in new research with Frank Esser and Peter Van Aelst, we identified the structural conditions that are theoretically linked to resilience to online disinformation, which relate to different political, media and economic environments. To test these expectations, we then identified quantifiable indicators for these theoretical conditions, which allowed us to measure their significance for 18 Western democracies. A cluster analysis then yielded three country groups: one group with high resilience to online disinformation (including the Northern European countries) and two country groups with low resilience (including Southern European countries and the US).

Conditions for resilience: political, media and economic environments

In polarised political environments, citizens are confronted with different deviating representations of reality and therefore it becomes increasingly difficult for them to distinguish between false and correct information. Thus, societal polarisation is likely to decrease resilience to online disinformation. Moreover, research has shown that both populism and partisan disinformation share a binary Manichaeanworldview, comprising anti-elitism, mistrust of expert knowledge and a belief in conspiracy theories. As a consequence of these combined influences, citizens can obtain inaccurate perceptions of reality. Thus, in environments with high levels of populist communication, online users are exposed to more disinformation.

Another condition that has been linked to resilience to online disinformation in previous research is trust in news media. Previous research has shown that in environments in which distrust in news media is higher, people are less likely to be exposed to a variety of sources of political information and to critically evaluate those. In this vein,the level of knowledge that people gain is likely to play an important role when confronted with online disinformation. Research has shown that in countries with wide-reaching public service media, citizens’ knowledge about public affairs is higher compared to countries with marginalised public service media. Therefore, it can be assumed that environments with weak public broadcasting services (PBS) are less resilient to online disinformation….

Looking at the economic environment, false social media content is often produced in pursuit of advertising revenue, as was the case with the Macedonian ‘fake news factories’ during the 2016 US presidential election. It is especially appealing for producers to publish this kind of content if the potential readership is large. Thus, large-size advertising markets with a high number of potential users are less resistant to disinformation than smaller-size markets….(More)”.

Disinformation is particularly prevalent on social media and in countries with very many social media users, it is easier for rumour-spreaders to build partisan follower networks. Moreover, it has been found that a media diet mainly consisting of news from social media limits political learning and leads to less knowledge of public affairs compared to other media source. From this, societies with a high rate of social media users are more vulnerable to online disinformation spreading rapidly than other societies…(More)”.

The US lacks health information technologies to stop COVID-19 epidemic


Niam Yaraghi at Brookings: “The COVID-19 pandemic highlights the crucial importance of health information technology and data interoperability. The pandemic has shattered our common beliefs about the type and scope of health information exchange. It has shown us that the definition of health data should no longer be limited to medical data of patients and instead should encompass a much wider variety of data types from individuals’ online and offline activity. Moreover, the pandemic has proven that healthcare is not local. In an interconnected world, with more individuals traveling long distances than ever before, it is naïve to look at regions in isolation from each other and try to manage public health independently. To efficiently manage a pandemic like this, the scope of health information exchange efforts should not be limited to small geographical regions and instead should be done at least nationally, if not internationally.

HEALTH DATA SHOULD GO BEYOND MEDICAL RECORDS

A wide variety of factors affect one’s overall well-being, a very small fraction of which could be quantified via medical records. We tend to ignore this fact, and try to explain and predict a patient’s condition only based on medical data. Previously, we did not have the technology and knowledge to collect huge amounts of non-medical data and analyze it for healthcare purposes. Now, privacy concerns and outdated regulations have exacerbated the situation and has led to a fragmented data ecosystem. Interoperability, even among healthcare providers, remains a major challenge where exchange and analysis of non-medical data for healthcare purposes almost never happens….(More)”.

Privacy and Pandemics


Emily Benson at the Bertelsmann Foundation: “In bucolic China, a child has braved cold temperatures for some fresh outdoors air. Overhead, a drone hovers. Its loudspeaker, a haunting combination of human direction in the machine age, chides him for being outdoors. “Hey kid! We’re in unusual times… The coronavirus is very serious… run!!” it barks. “Staying at home is contributing to society.”

The ferocious spread of COVID-19 in 2020 has revealed stark policy differences among governments. The type of actions and degrees of severity with which governments have responded varies widely, but one pressing issue the crisis raises is how COVID-19 will affect civil liberties in the digital age.

The Chinese Approach

Images of riot gear with heat-sensing cameras and temperature gun checks in metro stations have been plastered in the news since the beginning of 2020, when the Chinese government undertook drastic measures to contain the spread of COVID-19. The government quickly set about enacting strict restraints on society that dictated where people went and what they could do.

In China, Alipay, an Alibaba subsidiary and equivalent of Elon Musk’s PayPal, joined forces with Ant Financial to launch Alipay Health Code, a software for smart phones. It indicates individuals’ health in green, yellow, and red, ultimately determining where citizens can and cannot go. The government has since mandated that citizens use this software, despite inaccuracies of temperature-reading technology that has led to the confinement of otherwise healthy individuals. It also remains unclear how this data will be used going forward–whether it will be stored indefinitely or used to augment civilians’ social scores. As the New York Times noted, this Chinese gathering of data would be akin to the Centers for Disease Control (CDC) using data from Amazon, Facebook, and Google to track citizens and then share that data with law enforcement–something that no longer seems so far-fetched.

An Evolving EU

The European Union is home to what is arguably the most progressive privacy regime in the world. In May 2018, the EU implemented the General Data Protection Regulation (GDPR). While processing personal data is generally permitted in cases in which individuals have provided explicit consent to the use of their data, several exceptions to these mining prohibitions are proving problematic in the time of COVID-19. For example, GDPR Article 9 provides an exception for public interest, permitting the processing of personal data when it is necessary for reasons of substantial public interest, and on the basis of Union or Member State law which must be proportionate to the aim pursued…(More)”.