Stefaan Verhulst

Announcement: “To address the COVID-19 pandemic and other dynamic threats, The GovLab has called for the development of a new data infrastructure and ecosystem. Establishing data collaboratives in a responsible manner often necessitates the creation of data sharing agreements and other legal documentation — a strain on time and capacity both for data holders and those who could use data in the public interest.

Today, to support the development of data collaboratives in a responsible and agile way, we are sharing a new tool that addresses the complexity in preparing a Data Sharing Agreement from Contracts for Data Collaboration (a joint initiative of SDSN-TReNDS, the World Economic Forum, The GovLab, and the University of Washington’s Information Risk Research Initiative). Providing a checklist to support organizations with reviewing, negotiating and preparing Data Sharing Arrangements, the intent is to strengthen stakeholder trust and help accelerate responsible data sharing arrangements given the urgency of the global pandemic.

(Please note that the check list is a tool for formulating and understanding legal issues, but we are not offering it as legal advice.)

CLICK HERE TO DOWNLOAD THE TOOL (More)”.

New Tool to Establish Responsible Data Collaboratives in the Time of COVID-19

Andrew Young at Datastewards.net: “This week, as part of the Responsible Data for Children initiative (RD4C), the GovLab and UNICEF launched a new case study series to provide insights on promising practice as well as barriers to realizing responsible data for children.

Drawing upon field-based research and established good practice, RD4C aims to highlight and support responsible handling of data for and about children; identify challenges and develop practical tools to assist practitioners in evaluating and addressing them; and encourage a broader discussion on actionable principles, insights, and approaches for responsible data management.

RD4C launched in October 2019 with the release of the RD4C Synthesis Report, Selected Readings, and the RD4C Principles: Purpose-Driven, People-Centric, Participatory, Protective of Children’s Rights, Proportional, Professionally Accountable, and Prevention of Harms Across the Data Lifecycle.

The RD4C Case Studies analyze data systems deployed in diverse country environments, with a focus on their alignment with the RD4C Principles. This week’s release includes case studies arising from field missions to Romania, Kenya, and Afghanistan in 2019. The data systems examined are:

Romania’s The Aurora Project

Childline Kenya

Afghanistan’s Nutrition Online Database…(More)”

The Responsible Data for Children (RD4C) Case Studies

Press Release: “As part of its ongoing response to the COVID-19 crisis, PARIS21 released today a policy brief at the intersection of statistics and policy making to help inform the measures taken to address the pandemic.

The COVID-19 pandemic has brought data to the centre of policy making and public attention. A diverse ecosystem of data producers, both private and public, report rates of infection, fatality and recovery on a daily basis. However, a proliferation of data, which is at times contradictory, can also lead to confusion and mistrust among data users.

Meanwhile, policymakers, development partners and citizens need to take quick, informed actions to design interventions that reach the most vulnerable and leave no one behind. As countries comply with lockdowns and other containment measures, national statistical systems (NSSs) face a dual effect of growing data demand and constrained supply. This in turn may squeeze NSSs beyond their institutional capacity.

At the same time, alternative data sources such as mobile phone or satellite data are in abundance. These data could potentially complement traditional sources such as censuses, surveys and administrative systems. However, with scant governance frameworks to scale and sustain their use, policy action is not yet based on a convergence of evidence.

This policy brief introduces a conceptual framework that describes the adverse effects of the crisis on NSSs in developing countries. Moreover, it suggests short and medium-term actions to mitigate the negative effects by:

1. Focusing data production on priority economic, social and demographic data.
2. Communicating proactively with citizens, academia, private sector and policy makers.
3. Positioning the NSO as advisor and knowledge bank for national governments.

NSSs contribute significantly to robust policy responses in a crisis. The brief thus calls on national statistical offices to assume a central role as coordinators of the NSSs and chart the way toward improved data ecosystem governance for informing policies during and after COVID-19….(More)”.

Combating COVID-19 with Data: What Role for National Statistical Systems?

Essay by stefania milan and Emiliano Treré at Data & Policy: “If numbers are the conditions of existence of the COVID-19 problem, we ought to pay attention to the actual (in)ability of many countries in the South to test their population for the virus, and to produce reliable population statistics more in general — let alone to adequately care for them. It is a matter of a “data gap” as well as of data quality, which even in “normal” times hinders the need for “evidence-based policy making, tracking progress and development, and increasing government accountability” (Chen et al., 2013). And while the World Health Organization issues warning about the “dramatic situation” concerning the spread of COVID-19 in the African continent, to name just one of the blind spots of our datasets of the global pandemic, the World Economic Forum calls for “flattening the curve” in developing countries. Progress has been made following the revision of the United Nations’ Millennium Development Goals in 2005, with countries in the Global South have been invited (and supported) to devise National Strategies for the Development of Statistics. Yet, a cursory look at the NYU GovLab’s valuable repository of data collaboratives” addressing the COVID-19 pandemic reveals the virtual absence of data collection and monitoring projects in the South of the hemisphere. The next obvious step is the dangerous equation “no data=no problem”.

Disease and “whiteness”

Epidemiology and pharmacogenetics (i.e. the study of the genetic basis of how people respond to pharmaceuticals), to name but a few amongst the number of concerned life sciences, are largely based on the “inclusion of white/Caucasians in studies and the exclusion of other ethnic groups” (Tutton, 2007). In other words, modeling of disease evolution and the related solutions are based on datasets that take into account primarily — and in fact almost exclusively — the caucasian population. This is a known problem in the field, which derives from the “assumption that a Black person could be thought of as being White”, dismissing specificities and differences. This problem has been linked to the “lack of social theory development, due mainly to the reluctance of epidemiologists to think about social mechanisms (e.g., racial exploitation)” (Muntaner, 1999, p. 121). While COVID-19 represents a slight variation on this trend, having been first identified in China, the problem on the large scale remains. And in times of a health emergency as global as this one, risks to be reinforced and perpetuated.

A succulent market for the industry

In the lack of national testing capacity, the developing world might fall prey to the blooming industry of genetic and disease testing, on the one hand, and of telecom-enabled population monitoring on the other. Private companies might be able to fill the gap left by the state, mapping populations at risk — while however monetizing their data. The case of 23andme is symptomatic of this rise of industry-led testing, which constitutes a double-edge sword. On the one hand, private actors might supply key services that resource-poor or failing states are unable to provide. On the other hand, however, the distorted and often hidden agendas of profit-led players reveals its shortcomings and dangers. If we look at the telecom industry, we note how it has contributed to track disease propagation in a number of health emergencies such as Ebola. And if the global open data community has called for smoother data exchange between the private and the public sector to collectively address the spread of the virus,in the absence of adequate regulatory frameworks in the Global South, for example in the field of privacy and data retention, local authorities might fall prey to outside interventions of dubious nature….(More)”.

A widening data divide: COVID-19 and the Global South

Pledge: “Immediate action is required to halt the COVID-19 Pandemic and treat those it has affected. It is a practical and moral imperative that every tool we have at our disposal be applied to develop and deploy technologies on a massive scale without impediment.

We therefore pledge to make our intellectual property available free of charge for use in ending the COVID-19 pandemic and minimizing the impact of the disease.

We will implement this pledge through a license that details the terms and conditions under which our intellectual property is made available.

How to make the Pledge

The first step for organizations wishing to make the Pledge is to publicly commit to making intellectual property relevant to COVID-19 freely available, by:

Posting a public statement that the organization is making the Pledge, on their website.
Issuing an official press release.

And then sending us a link to this statement, a point of contact in the organization, and, at the organization’s discretion, a copy of their logo to display on this site.

How to implement the Pledge

The next step for organizations who have made the Pledge is to implement it via a license detailing the terms and conditions under which their intellectual property is made available. There are three options for doing so:

Adopt the Open COVID License, created by our legal team for organizations that wish to implement the Pledge simply and immediately on terms shared by many other organizations.
Create a custom license that accomplished the intent of the Pledge.
Identify existing license(s) that accomplish the goals of the Pledge.

As with making the Pledge, send us links to the license or licenses, a point of contact in the organization, and, at the organization’s discretion, a copy of their logo to display on this site….(More)”.

Open Covid Pledge

Norman Fenton, Magda Osman, Martin Neil, and Scott McLachlan at The Conversation: “Suppose we wanted to estimate how many car owners there are in the UK and how many of those own a Ford Fiesta, but we only have data on those people who visited Ford car showrooms in the last year. If 10% of the showroom visitors owned a Fiesta, then, because of the bias in the sample, this would certainly overestimate the proportion of Ford Fiesta owners in the country.

Estimating death rates for people with COVID-19 is currently undertaken largely along the same lines. In the UK, for example, almost all testing of COVID-19 is performed on people already hospitalised with COVID-19 symptoms. At the time of writing, there are 29,474 confirmed COVID-19 cases (analogous to car owners visiting a showroom) of whom 2,352 have died (Ford Fiesta owners who visited a showroom). But it misses out all the people with mild or no symptoms.

Concluding that the death rate from COVID-19 is on average 8% (2,352 out of 29,474) ignores the many people with COVID-19 who are not hospitalised and have not died (analogous to car owners who did not visit a Ford showroom and who do not own a Ford Fiesta). It is therefore equivalent to making the mistake of concluding that 10% of all car owners own a Fiesta.

There are many prominent examples of this sort of conclusion. The Oxford COVID-19 Evidence Service have undertaken a thorough statistical analysis. They acknowledge potential selection bias, and add confidence intervals showing how big the error may be for the (potentially highly misleading) proportion of deaths among confirmed COVID-19 patients.

They note various factors that can result in wide national differences – for example the UK’s 8% (mean) “death rate” is very high compared to Germany’s 0.74%. These factors include different demographics, for example the number of elderly in a population, as well as how deaths are reported. For example, in some countries everybody who dies after having been diagnosed with COVID-19 is recorded as a COVID-19 death, even if the disease was not the actual cause, while other people may die from the virus without actually having been diagnosed with COVID-19.

However, the models fail to incorporate explicit causal explanations in their modelling that might enable us to make more meaningful inferences from the available data, including data on virus testing.

What a causal model would look like. Author provided

We have developed an initial prototype “causal model” whose structure is shown in the figure above. The links between the named variables in a model like this show how they are dependent on each other. These links, along with other unknown variables, are captured as probabilities. As data are entered for specific, known variables, all of the unknown variable probabilities are updated using a method called Bayesian inference. The model shows that the COVID-19 death rate is as much a function of sampling methods, testing and reporting, as it is determined by the underlying rate of infection in a vulnerable population….(More)”

Coronavirus: country comparisons are pointless unless we account for these biases in testing

Alex Engler at Brookings: “The COVID-19 outbreak has spurred considerable news coverage about the ways artificial intelligence (AI) can combat the pandemic’s spread. Unfortunately, much of it has failed to be appropriately skeptical about the claims of AI’s value. Like many tools, AI has a role to play, but its effect on the outbreak is probably small. While this may change in the future, technologies like data reporting, telemedicine, and conventional diagnostic tools are currently far more impactful than AI.

Still, various news articles have dramatized the role AI is playing in the pandemic by overstating what tasks it can perform, inflating its effectiveness and scale, neglecting the level of human involvement, and being careless in consideration of related risks. In fact, the COVID-19 AI-hype has been diverse enough to cover the greatest hits of exaggerated claims around AI. And so, framed around examples from the COVID-19 outbreak, here are eight considerations for a skeptic’s approach to AI claims….(More)”.

A guide to healthy skepticism of artificial intelligence and coronavirus

Paper by Srushti Wadekar, Kunal Thapar, Komal Barge, Rahul Singh, Devanshu Mishra and Sabah Mohammed: “Civic technology is a fast-developing segment that holds huge potential for a new generation of startups. A recent survey report on civic technology noted that the sector saw $430 million in investment in just the last two years. It’s not just a new market ripe with opportunity it’s crucial to our democracy. Crowdsourcing has proven to be an effective supplementary mechanism for public engagement in city government in order to use mutual knowledge in online communities to address such issues as a means of engaging people in urban design. Government needs new alternatives — alternatives of modern, superior tools and services that are offered at reasonable rates.

An effective and easy-to-use civic technology platform enables wide participation. Response to, and a ‘conversation’ with, the users is very crucial for engagement, as is a feeling of being part of a society. These findings can contribute to the future design of civic technology platforms. In this research, we are trying to introduce a crowdsourcing platform, which will be helpful to people who are facing problems in their everyday practice because of the government services. This platform will gather the information from the trending twitter tweets for last month or so and try to identify which challenges public is confronting. Twitter for crowdsourcing as it is a simple social platform for questions and for the people who see the tweet to get an instant answer. These problems will be analyzed based on their significance which then will be made open to public for its solutions. The findings demonstrate how crowdsourcing tends to boost community engagement, enhances citizens ‘ views of their town and thus tends us find ways to enhance the city’s competitiveness, which faces some serious problems. Using of topic modeling with Latent Dirichlet Allocation (LDA) algorithm helped get categorized civic technology topics which was then validated by simple classification algorithm. While working on this research, we encountered some issues regarding to the tools that were available which we have discussed in the ‘Counter arguments’ section….(More)”.

Developing better Civic Services through Crowdsourcing: The Twitter Case Study

Matt Apuzzo and David D. Kirkpatrick at The New York Times: “…Normal imperatives like academic credit have been set aside. Online repositories make studies available months ahead of journals. Researchers have identified and shared hundreds of viral genome sequences. More than 200 clinical trials have been launched, bringing together hospitals and laboratories around the globe.

“I never hear scientists — true scientists, good quality scientists — speak in terms of nationality,” said Dr. Francesco Perrone, who is leading a coronavirus clinical trial in Italy. “My nation, your nation. My language, your language. My geographic location, your geographic location. This is something that is really distant from true top-level scientists.”

On a recent morning, for example, scientists at the University of Pittsburgh discovered that a ferret exposed to Covid-19 particles had developed a high fever — a potential advance toward animal vaccine testing. Under ordinary circumstances, they would have started work on an academic journal article.

“But you know what? There is going to be plenty of time to get papers published,” said Paul Duprex, a virologist leading the university’s vaccine research. Within two hours, he said, he had shared the findings with scientists around the world on a World Health Organization conference call. “It is pretty cool, right? You cut the crap, for lack of a better word, and you get to be part of a global enterprise.”…

Several scientists said the closest comparison to this moment might be the height of the AIDS epidemic in the 1990s, when scientists and doctors locked arms to combat the disease. But today’s technology and the pace of information-sharing dwarfs what was possible three decades ago.

As a practical matter, medical scientists today have little choice but to study the coronavirus if they want to work at all. Most other laboratory research has been put on hold because of social distancing, lockdowns or work-from-home restrictions.

The pandemic is also eroding the secrecy that pervades academic medical research, said Dr. Ryan Carroll, a Harvard Medical professor who is involved in the coronavirus trial there. Big, exclusive research can lead to grants, promotions and tenure, so scientists often work in secret, suspiciously hoarding data from potential competitors, he said.

“The ability to work collaboratively, setting aside your personal academic progress, is occurring right now because it’s a matter of survival,” he said….(More)”.

Covid-19 Changed How the World Does Science, Together

“Data & Policy, an open-access journal exploring the potential of data science for governance and public decision-making, published its first cluster of peer-reviewed articles last week.

The articles include three contributions specifically concerned with data protection by design:

· Gefion Theurmer and colleagues (University of Southampton) distinguish between data trusts and other data sharing mechanisms and discuss the need for workflows with data protection at their core;

· Swee Leng Harris (King’s College London) explores Data Protection Impact Assessments as a framework for helping us know whether government use of data is legal, transparent and upholds human rights;

· Giorgia Bincoletto’s (University of Bologna) study investigates data protection concerns arising from cross-border interoperability of Electronic Health Record systems in the European Union;

Also published, research by Jacqueline Lam and colleagues (University of Cambridge; Hong Kong University) on how fine-grained data from satellites and other sources can help us understand environmental inequality and socio-economic disparities in China, and this also reflects upon the importance of safeguarding data privacy and security. See also the blogs this week on the potential of Data Collaboratives for COVID-19 by Editor-in-Chief Stefaan Verhulst (the GovLab) and how COVID-19 exposes a widening data divide for the Global South, by Stefania Milan (University of Amsterdam) and Emiliano Treré (University of Cardiff).

Data & Policy is an open access, peer-reviewed venue for contributions that consider how systems of policy and data relate to one another. Read the 5 ways you can contribute to Data & Policy and contact dataandpolicy@cambridge.org with any questions….(More)”.

Data & Policy

Stefaan Verhulst

How to make the Pledge

How to implement the Pledge

Get the latest news right in you inbox