Statistics and Data Science for Good


Introduction to Special Issue of Chance by Caitlin Augustin, Matt Brems, and Davina P. Durgana: “One lesson that our team has taken from the past 18 months is that no individual, no team, and no organization can be successful on their own. We’ve been grateful and humbled to witness incredible collaboration—taking on forms of resource sharing, knowledge exchange, and reimagined outcomes. Some advances, like breakthrough medicine, have been widely publicized. Other advances have received less fanfare. All of these advances are in the public interest and demonstrate how collaborations can be done “for good.”

In reading this issue, we hope that you realize the power of diverse multidisciplinary collaboration; you recognize the positive social impact that statisticians, data scientists, and technologists can have; and you learn that this isn’t limited to companies with billions of dollars or teams of dozens of people. You, our reader, can get involved in similar positive social change.

This special edition of CHANCE focuses on using data and statistics for the public good and on highlighting collaborations and innovations that have been sparked by partnerships between pro bono institutions and social impact partners. We recognize that the “pro bono” or “for good” field is vast, and we welcome all actors working in the public interest into the big tent.

Through the focus of this edition, we hope to demonstrate how new or novel collaborations might spark meaningful and lasting positive change in communities, sectors, and industries. Anchored by work led through Statistics Without Borders and DataKind, this edition features reporting on projects that touch on many of the United Nations Sustainable Development Goals (SDGs).

Pro bono volunteerism is one way of democratizing access to high-skill, high-expense services that are often unattainable for social impact organizations. Statistics Without Borders (founded in 2008), DataKind (founded in 2012), and numerous other volunteer organizations began with this model in mind: If there was an organizing or galvanizing body that could coordinate the myriad requests for statistical, data science, machine learning, or data engineering help, there would be a ready supply of talented individuals who would want to volunteer to see those projects through. Or, put another way, “If you build it, they will come.”

Doing pro bono work requires more than positive intent. Plenty of well-meaning organizations and individuals charitably donate their time, their energy, their expertise, only to have an unintended adverse impact. To do work for good, ethics is an important part of the projects. In this issue, you’ll notice the writers’ attention to independent review boards (IRBs), respecting client and data privacy, discussing ethical considerations of methods used, and so on.

While no single publication can fully capture the great work of pro bono organizations working in “data for good,” we hope readers will be inspired to contribute to open source projects, solve problems in a new way, or even volunteer themselves for a future cohort of projects. We’re thrilled that this special edition represents programs, partners, and volunteers from around the world. You will learn about work that is truly representative of the SDGs, such as international health organizations’ work in Uganda, political justice organizations in Kenya, and conservationists in Madagascar, to name a few.

Several articles describe projects that are contextualized with the SDGs. While achieving many goals is interconnected, such as the intertwining of economic attainment and reducing poverty, we hope that calling out key themes here will whet your appetite for exploration.

  • • Multiple articles focused on tackling aspects of SDG 3: Ensuring healthy lives and promoting well-being for people at all ages.
  • • An article tackling SDG 8: Promote sustained, inclusive, and sustainable economic growth; full and productive employment; and decent work for all.
  • • Several articles touching on SDG 9: Build resilient infrastructure; promote inclusive and sustainable industrialization, and foster innovation; one is a reflection on building and sustaining free and open source software as a public good.
  • • A handful of articles highlighting the needs for capacity-building and systems-strengthening aligned to SDG 16: Promote peaceful and inclusive societies for sustainable development; provide access to justice for all; and build effective, accountable, and inclusive institutions at all levels.
  • • An article about migration along the southern borders of the United States addressing multiple issues related to poverty (SDG 1), opportunity (SDG 10), and peace and justice (SDG 16)….(More)”

Putting data at the heart of policymaking will accelerate London’s recovery


Mel Hobson at Computer Weekly: “…London’s mayor, Sadiq Khan, knows how important this is. His re-election manifesto committed to rebuilding the London Datastore, currently home to over 700 freely available datasets, as the central register linking data across our city. That in turn will help analysts, researchers and policymakers understand our city and develop new ideas and solutions.

To help take the next step and create a data ecosystem that can improve millions of Londoners lives, businesses across our capital are committing their expertise and insights.

At London First, we have launched the London Data Charter, expertly put together by Pinsent Masons, and setting out the guiding principles for private and public sector data collaborations, which are key to creating this ecosystem. These include a focus on protecting privacy and security of data, promoting trust and sharing learnings with others – creating scalable solutions to meet the capital’s challenges….(More)”.

UNCTAD calls on countries to make digital data flow for the benefit of all


Press Release: “The world needs a new global governance approach to enable digital data to flow across borders as freely as necessary and possible, says UNCTAD’s Digital Economy Report 2021 released on 29 September.

The UN trade and development body says the new approach should help maximize development gains, ensure those gains are equitably distributed and minimize risks and harms.

It should also enable worldwide data sharing, develop global digital public goods, increase trust and reduce uncertainty in the digital economy.

The report says the new global system should also help avoid further fragmentation of the internet, address policy challenges emerging from the dominant positions of digital platforms and narrow existing inequalities.

“It is more important than ever to embark on a new path for digital and data governance,” says UN Secretary-General António Guterres in his preface to the report.

“The current fragmented data landscape risks us failing to capture value that could accrue from digital technologies and it may create more space for substantial harms related to privacy breaches, cyberattacks and other risks.”

UNCTAD Secretary-General Rebeca Grynspan said: “We urgently need a renewed focus on achieving global digital and data governance, developing global digital public goods, increasing trust and reducing uncertainty in the digital economy. The pandemic has shown the critical importance of sharing health data globally – the issue of digital governance can no longer be postponed.”

Pandemic underscores need for new governance

Digital data play an increasingly important role as an economic and strategic resource, a trend reinforced by the COVID-19 pandemic.

The pandemic has shown the importance of sharing health data globally to help countries cope with its consequences, and for research purposes in finding vaccines.

“The increased interconnection and interdependence challenges in the global data economy call for moving away from the silo approach towards a more holistic, coordinated global approach,” UNCTAD Deputy Secretary-General Isabelle Durant said.

“Moreover, new and innovative ways of global governance are urgently needed, as the old ways may not be well suited to respond to the new context,” she added.

New UN data-related body proposed

UNCTAD proposes the formation of a new United Nations coordinating body, with a focus on, and with the skills for, assessing and developing comprehensive global digital and data governance. Its work should be multilateral, multi-stakeholder and multidisciplinary.

It should also seek to remedy the current underrepresentation of developing countries in global and regional data governance initiatives.

The body should also function as a complement to and in coherence with national policies and provide sufficient policy space to ensure countries with different levels of digital readiness and capacities can benefit from the data-driven digital economy…(More)”.

Greece used AI to curb COVID: what other nations can learn


Editorial at Nature: “A few months into the COVID-19 pandemic, operations researcher Kimon Drakopoulos e-mailed both the Greek prime minister and the head of the country’s COVID-19 scientific task force to ask if they needed any extra advice.

Drakopoulos works in data science at the University of Southern California in Los Angeles, and is originally from Greece. To his surprise, he received a reply from Prime Minister Kyriakos Mitsotakis within hours. The European Union was asking member states, many of which had implemented widespread lockdowns in March, to allow non-essential travel to recommence from July 2020, and the Greek government needed help in deciding when and how to reopen borders.

Greece, like many other countries, lacked the capacity to test all travellers, particularly those not displaying symptoms. One option was to test a sample of visitors, but Greece opted to trial an approach rooted in artificial intelligence (AI).

Between August and November 2020 — with input from Drakopoulos and his colleagues — the authorities launched a system that uses a machine-learning algorithm to determine which travellers entering the country should be tested for COVID-19. The authors found machine learning to be more effective at identifying asymptomatic people than was random testing or testing based on a traveller’s country of origin. According to the researchers’ analysis, during the peak tourist season, the system detected two to four times more infected travellers than did random testing.

The machine-learning system, which is among the first of its kind, is called Eva and is described in Nature this week (H. Bastani et al. Nature https://doi.org/10.1038/s41586-021-04014-z; 2021). It’s an example of how data analysis can contribute to effective COVID-19 policies. But it also presents challenges, from ensuring that individuals’ privacy is protected to the need to independently verify its accuracy. Moreover, Eva is a reminder of why proposals for a pandemic treaty (see Nature 594, 8; 2021) must consider rules and protocols on the proper use of AI and big data. These need to be drawn up in advance so that such analyses can be used quickly and safely in an emergency.

In many countries, travellers are chosen for COVID-19 testing at random or according to risk categories. For example, a person coming from a region with a high rate of infections might be prioritized for testing over someone travelling from a region with a lower rate.

By contrast, Eva collected not only travel history, but also demographic data such as age and sex from the passenger information forms required for entry to Greece. It then matched those characteristics with data from previously tested passengers and used the results to estimate an individual’s risk of infection. COVID-19 tests were targeted to travellers calculated to be at highest risk. The algorithm also issued tests to allow it to fill data gaps, ensuring that it remained up to date as the situation unfolded.

During the pandemic, there has been no shortage of ideas on how to deploy big data and AI to improve public health or assess the pandemic’s economic impact. However, relatively few of these ideas have made it into practice. This is partly because companies and governments that hold relevant data — such as mobile-phone records or details of financial transactions — need agreed systems to be in place before they can share the data with researchers. It’s also not clear how consent can be obtained to use such personal data, or how to ensure that these data are stored safely and securely…(More)”.

Where Is Everyone? The Importance of Population Density Data


Data Artefact Study by Aditi Ramesh, Stefaan Verhulst, Andrew Young and Andrew Zahuranec: “In this paper, we explore new and traditional approaches to measuring population density, and ways in which density information has frequently been used by humanitarian, private-sector and government actors to advance a range of private and public goals. We explain how new innovations are leading to fresh ways of collecting data—and fresh forms of data—and how this may open up new avenues for using density information in a variety of contexts. Section III examines one particular example: Facebook’s High-Resolution Population Density Maps (also referred to as HRSL, or high resolution settlement layer). This recent initiative, created in collaboration with a number of external organizations, shows not only the potential of mapping innovations but also the potential benefits of inter-sectoral partnerships and sharing. We examine three particular use cases of HRSL, and we follow with an assessment and some lessons learned. These lessons are applicable to HRSL in particular, but also more broadly. We conclude with some thoughts on avenues for future research….(More)”.

EU Health data centre and a common data strategy for public health


Report by the European Parliament Think Tank: “Regarding health data, its availability and comparability, the Covid-19 pandemic revealed that the EU has no clear health data architecture. The lack of harmonisation in these practices and the absence of an EU-level centre for data analysis and use to support a better response to public health crises is the focus of this study. Through extensive desk review, interviews with key actors, and enquiry into experiences from outside the EU/EEA area, this study highlights that the EU must have the capacity to use data very effectively in order to make data-supported public health policy proposals and inform political decisions. The possible functions and characteristics of an EU health data centre are outlined. The centre can only fulfil its mandate if it has the power and competency to influence Member State public-health-relevant data ecosystems and institutionally link with their national level actors. The institutional structure, its possible activities and in particular its usage of advanced technologies such as AI are examined in detail….(More)”.

New York City to Require Food Delivery Services to Share Customer Data with Restaurants


Hunton Privacy Blog: “On August 29, 2021, a New York City Council bill amending the New York City Administrative Code to address customer data collected by food delivery services from online orders became law after the 30-day period for the mayor to sign or veto lapsed. Effective December 27, 2021, the law will permit restaurants to request customer data from third-party food delivery services and require delivery services to provide, on at least a monthly basis, such customer data until the restaurant “requests to no longer receive such customer data.” Customer data includes name, phone number, email address, delivery address and contents of the order.

Although customers are permitted to request that their customer data not be shared, the presumption under the law is that “customers have consented to the sharing of such customer data applicable to all online orders, unless the customer has made such a request in relation to a specific online order.” The food delivery services are required to provide on its website a way for customers to request that their data not be shared “in relation to such online order.” To “assist its customers with deciding whether their data should be shared,” delivery services must disclose to the customer (1) the data that may be shared with the restaurant and (2) the restaurant fulfilling the order as the recipient of the data.

The law will permit restaurants to use the customer data for marketing and other purposes, and prohibit delivery apps from restricting such activities by restaurants. Restaurants that receive the customer data, however, must allow customers to request and delete their customer data. In addition, restaurants are not permitted to sell, rent or disclose customer data to any other party in exchange for financial benefit, except with the express consent of the customer….(More)”.

Exploring a new governance agenda: What are the questions that matter?


Article by Nicola Nixon, Stefaan Verhulst, Imran Matin & Philips J. Vermonte: “…Late last year, we – the Governance Lab at NYUthe CSIS Indonesiathe BRAC Institute of Governance and Development, Bangladesh and The Asia Foundation – joined forces across New York, Jakarta, Dhaka, Hanoi, and San Francisco to launch the 100 Governance Questions Initiative. This is the latest iteration of the GovLab’s broader initiative to map questions across several domains.

We live in an era marked by an unprecedented amount of data. Anyone who uses a mobile phone or accesses the internet is generating vast streams of information. Covid-19 has only intensified this phenomenon. 

Although this data contains tremendous potential for positive social transformation, much of that potential goes unfulfilled. In the development context, one chief problem is that data initiatives are often driven by supply (i.e., what data or data solutions are available?) rather than demand (what problems actually need solutions?). Too many projects begin with the database, the app, the dashboard–beholden to the seduction of technology– and now, many parts of the developing world are graveyards of tech pilots. As is well established in development theory but not yet fully in practice, solution-driven governance interventions are destined to fail.

The 100 Questions Initiative, pioneered by the GovLab, seeks to overcome the chasm between supply and demand. It begins not by searching for what data is available, but by asking important questions about the biggest challenges societies and countries face, and then seeking more targeted and relevant data solutions. In doing this, it narrows the gap between policy makers and constituents, providing opportunities for improved evidence-based policy and community engagement in developing countries. As part of this initiative, we seek to define the ten most important questions across several domains, including Migration, Gender, Employment, the Future of Work, and—now–Governance.

On this occasion, we invited over 100 experts and practitioners in governance and data science –whom we call “bilinguals”– from various organizations, companies, and government agencies to identify what they see as the most pressing governance questions in their respective domains. Over 100 bilinguals were encouraged to prioritize potential impact, novelty, and feasibility in their questioning — moving toward a roadmap for data-driven action and collaboration that is both actionable and ambitious.   

By June, the bilinguals had articulated 170 governance-related questions. Over the next couple of months, these were sorted, discussed and refined during two rounds of collaboration with the bilinguals; first to narrow down to the top 40 and then to the top 10. Bilinguals were asked what, to them, are the most significant governance questions we must answer with data today? The result is the following 10 questions:…(More)” ( Public Voting Platform)”.

Climate change versus children: How a UNICEF data collaborative gave birth to a risk index


Jess Middleton at DataIQ: “Almost a billion children face climate-related disasters in their lifetime, according to UNICEF’s new Children’s Climate Risk Index (CCRI).

The CCRI is the first analysis of climate risk specifically from a child’s perspective. It reveals that children in Central African Republic, Chad and Nigeria are at the highest risk from climate and environmental shocks based on their access to essential services….

Young climate activists including Greta Thunberg contributed a foreword to the report that introduced the index; and the project has added another layer of pressure on governments failing to act on climate change in the run-up to the 2021 United Nations Climate Change Conference – set to be held in Glasgow in November.

While these statistics make for grim reading, the collective effort undertaken to create the Index is evidence of the power of data as a tool for advocacy and the role that data collaboratives can play in shaping positive change.

The CCRI is underpinned by data that was sourced, collated and analysed by the Data for Children Collaborative with UNICEF, a partnership between UNICEF, the Scottish Government and University of Edinburgh hosted by The Data Lab.

The collaboration brings together practitioners from diverse backgrounds to provide data-driven solutions to issues faced by children around the world.

For work on the CCRI, the collaborative sought data, skills and expertise from academia (Universities of Southampton, Edinburgh, Stirling, Highlands and Islands) as well as the public and private sectors (ONS-FCDO Data Science Hub, Scottish Alliance for Geoscience, Environment & Society).

This variety of expertise provided the knowledge required to build the two main pillars of input for the CCRI: socioeconomic and climate science data.

Socioeconomic experts sourced data and provided analytical expertise in the context of child vulnerability, social statistics, biophysical processes and statistics, child welfare and child poverty.

Climate experts focused on factors such as water scarcity, flood exposure, coastal flood risk, pollution and exposure to vector borne disease.

The success of the project hinged on the effective collaboration between distinct areas of expertise to deliver on UNICEF’s problem statement.

The director of the Data for Children Collaborative with UNICEF, Alex Hutchison, spoke with DataIQ about the success of the project, the challenges the team faced, and the benefits of working as part of a diverse collective….(More). (Report)”

Using Satellite Imagery and Machine Learning to Estimate the Livelihood Impact of Electricity Access


Paper by Nathan Ratledge et al: “In many regions of the world, sparse data on key economic outcomes inhibits the development, targeting, and evaluation of public policy. We demonstrate how advancements in satellite imagery and machine learning can help ameliorate these data and inference challenges. In the context of an expansion of the electrical grid across Uganda, we show how a combination of satellite imagery and computer vision can be used to develop local-level livelihood measurements appropriate for inferring the causal impact of electricity access on livelihoods. We then show how ML-based inference techniques deliver more reliable estimates of the causal impact of electrification than traditional alternatives when applied to these data. We estimate that grid access improves village-level asset wealth in rural Uganda by 0.17 standard deviations, more than doubling the growth rate over our study period relative to untreated areas. Our results provide country-scale evidence on the impact of a key infrastructure investment, and provide a low-cost, generalizable approach to future policy evaluation in data sparse environments….(More)”.