data collaboratives

Principled Data Access: Building Public-private Data Partnerships for Better Official Statistics

Curated on August 5, 2021August 5, 2021 by Stefaan Verhulst

Paper by Claudia Biancotti, Oscar Borgogno and Giovanni Veronese: “Official statistics serve as an important compass for policymakers due to their quality, impartiality, and transparency. In the current post-pandemic environment of great uncertainty and widespread disinformation, they need to serve this purpose more than ever. The wealth of data produced by the digital society (e.g. from user activity on online platforms or from Internet-of-Things devices) could help official statisticians improve the salience, timeliness and depth of their output. This data, however, tends to be locked away within the private sector. We argue that this should change and we propose a set of principles under which the public and the private sector can form partnerships to leverage the potential of new-generation data in the public interest. The principles, compatible with a variety of legal frameworks, aim at establishing trust between data collectors, data subjects, and statistical authorities, while also ensuring the technical usability of the data and the sustainability of partnerships over time. They are driven by a logic of incentive compatibility and burden sharing….(More)”

Using Satellite Imagery and Deep Learning to Evaluate the Impact of Anti-Poverty Programs

Curated on August 4, 2021August 4, 2021 by Stefaan Verhulst

Paper by Luna Yue Huang, Solomon M. Hsiang & Marco Gonzalez-Navarro: “The rigorous evaluation of anti-poverty programs is key to the fight against global poverty. Traditional approaches rely heavily on repeated in-person field surveys to measure program effects. However, this is costly, time-consuming, and often logistically challenging. Here we provide the first evidence that we can conduct such program evaluations based solely on high-resolution satellite imagery and deep learning methods. Our application estimates changes in household welfare in a recent anti-poverty program in rural Kenya. Leveraging a large literature documenting a reliable relationship between housing quality and household wealth, we infer changes in household wealth based on satellite-derived changes in housing quality and obtain consistent results with the traditional field-survey based approach. Our approach generates inexpensive and timely insights on program effectiveness in international development programs…(More)”.

Machine Learning and Mobile Phone Data Can Improve the Targeting of Humanitarian Assistance

Curated on July 28, 2021 by Stefaan Verhulst

Paper by Emily Aiken et al: “The COVID-19 pandemic has devastated many low- and middle-income countries (LMICs), causing widespread food insecurity and a sharp decline in living standards. In response to this crisis, governments and humanitarian organizations worldwide have mobilized targeted social assistance programs. Targeting is a central challenge in the administration of these programs: given available data, how does one rapidly identify the individuals and families with the greatest need? This challenge is particularly acute in the large number of LMICs that lack recent and comprehensive data on household income and wealth.

Here we show that non-traditional “big” data from satellites and mobile phone networks can improve the targeting of anti-poverty programs. Our approach uses traditional survey-based measures of consumption and wealth to train machine learning algorithms that recognize patterns of poverty in non-traditional data; the trained algorithms are then used to prioritize aid to the poorest regions and mobile subscribers. We evaluate this approach by studying Novissi, Togo’s flagship emergency cash transfer program, which used these algorithms to determine eligibility for a rural assistance program that disbursed millions of dollars in COVID-19 relief aid. Our analysis compares outcomes – including exclusion errors, total social welfare, and measures of fairness – under different targeting regimes. Relative to the geographic targeting options considered by the Government of Togo at the time, the machine learning approach reduces errors of exclusion by 4-21%. Relative to methods that require a comprehensive social registry (a hypothetical exercise; no such registry exists in Togo), the machine learning approach increases exclusion errors by 9-35%. These results highlight the potential for new data sources to contribute to humanitarian response efforts, particularly in crisis settings when traditional data are missing or out of date….(More)”.

Big data for economic statistics

Curated on July 26, 2021July 29, 2021 by Stefaan Verhulst

Stats Brief by ESCAP: “This Stats Brief gives an overview of big data sources that can be used to produce economic statistics and presents country examples of the use of online price data, scanner data, mobile phone data, Earth Observations, financial transactions data and smart meter data to produce price indices, tourism statistics, poverty estimates, experimental economic statistics during COVID-19 and to monitor public sentiment. The Brief is part of ESCAP’s series on the use of non-traditional data sources for official statistics….(More)”.

The coloniality of collaboration: sources of epistemic obedience in data-intensive astronomy in Chile

Curated on July 25, 2021July 25, 2021 by Stefaan Verhulst

Paper by Sebastián Lehuedé: “Data collaborations have gained currency over the last decade as a means for data- and skills-poor actors to thrive as a fourth paradigm takes hold in the sciences. Against this backdrop, this article traces the emergence of a collaborative subject position that strives to establish reciprocal and technical-oriented collaborations so as to catch up with the ongoing changes in research.

Combining insights from the modernity/coloniality group, political theory and science and technology studies, the article argues that this positionality engenders epistemic obedience by bracketing off critical questions regarding with whom and for whom knowledge is generated. In particular, a dis-embedding of the data producers, the erosion of local ties, and a data conformism are identified as fresh sources of obedience impinging upon the capacity to conduct research attuned to the needs and visions of the local context. A discursive-material analysis of interviews and field notes stemming from the case of astronomy data in Chile is conducted, examining the vision of local actors aiming to gain proximity to the mega observatories producing vast volumes of data in the Atacama Desert.

Given that these observatories are predominantly under the control of organisations from the United States and Europe, the adoption of a collaborative stance is now seen as the best means to ensure skills and technology transfer to local research teams. Delving into the epistemological dimension of data colonialism, this article warns that an increased emphasis on collaboration runs the risk of reproducing planetary hierarchies in times of data-intensive research….(More)”.

Inclusive SDG Data Partnerships

Curated on July 24, 2021July 24, 2021 by Stefaan Verhulst

“Learning report” by Partners for Review (P4R/GIZ), the Danish Institute for Human Rights (DIHR), and the International Civil Society Centre: “It brought together National SDG Units, National Statistics Offices, National Human Rights Institutions and civil society organisations from across six countries. The initiative’s purpose is to advance data partnerships for the SDGs and to strengthen multi-actor data ecosystems at the national level. Goal is to meet the SDG data challenge by improving the use of alternative data sources, particularly data produced by civil society and human rights institutions, and complementary to official statistics….(More)”.

Household Financial Transaction Data

Curated on July 22, 2021July 22, 2021 by Stefaan Verhulst

Paper by Scott R. Baker & Lorenz Kueng: “The growth of the availability and use of detailed household financial transaction microdata has dramatically expanded the ability of researchers to understand both household decision-making as well as aggregate fluctuations across a wide range of fields. This class of transaction data is derived from a myriad of sources including financial institutions, FinTech apps, and payment intermediaries. We review how these detailed data have been utilized in finance and economics research and the benefits they enable beyond more traditional measures of income, spending, and wealth. We discuss the future potential for this flexible class of data in firm-focused research, real-time policy analysis, and macro statistics….(More)”.

Financial data unbound: The value of open data for individuals and institutions

Curated on July 21, 2021July 22, 2021 by Stefaan Verhulst

Paper by McKinsey Global Institute: “As countries around the world look to ensure rapid recovery once the COVID-19 crisis abates, improved financial services are emerging as a key element to boost growth, raise economic efficiency, and lift productivity. Robust digital financial infrastructure proved its worth during the crisis, helping governments cushion people and businesses from the economic shock of the pandemic. The next frontier is to create an open-data ecosystem for finance.

Already, technological, regulatory, and competitive forces are moving markets toward easier and safer financial data sharing. Open-data initiatives are springing up globally, including the United Kingdom’s Open Banking Implementation Entity, the European Union’s second payment services directive, Australia’s new consumer protection laws, Brazil’s drafting of open data guidelines, and Nigeria’s new Open Technology Foundation (Open Banking Nigeria). In the United States, the Consumer Financial Protection Bureau aims to facilitate a consumer-authorized data-sharing market, while the Financial Data Exchange consortium attempts to promote common, interoperable standards for secure access to financial data. Yet, even as many countries put in place stronger digital financial infrastructure and data-sharing mechanisms, COVID-19 has exposed limitations and gaps in their reach, a theme we explored in earlier research.

This discussion paper from the McKinsey Global Institute (download full text in 36-page PDF) looks at the potential value that could be created—and the key issues that will need to be addressed—by the adoption of open data for finance. We focus on four regions: the European Union, India, the United Kingdom, and the United States.

By open data, we mean the ability to share financial data through a digital ecosystem in a manner that requires limited effort or manipulation. Advantages include more accurate credit risk evaluation and risk-based pricing, improved workforce allocation, better product delivery and customer service, and stronger fraud protection.

Our analysis suggests that the boost to the economy from broad adoption of open-data ecosystems could range from about 1 to 1.5 percent of GDP in 2030 in the European Union, the United Kingdom, and the United States, to as much as 4 to 5 percent in India. All market participants benefit, be they institutions or consumers—either individuals or micro-, small-, and medium-sized enterprises (MSMEs)—albeit to varying degrees….(More)”.

How data governance technologies can democratize data sharing for community well-being

Curated on July 15, 2021July 15, 2021 by Stefaan Verhulst

Paper by Dan Wu, Stefaan Verhulst, Alex Pentland, Thiago Avila, Kelsey Finch, and Abhishek Gupta in Data & Policy (Cambridge University Press) focusing on “Data sharing efforts to allow underserved groups and organizations to overcome the concentration of power in our data landscape…

A few special organizations, due to their data monopolies and resources, are able to decide which problems to solve and how to solve them. But even though data sharing creates a counterbalancing democratizing force, it must nevertheless be approached cautiously. Underserved organizations and groups must navigate difficult barriers related to technological complexity and legal risk.

To examine what those common barriers are, one type of data sharing effort—data trusts—are examined, specifically the reports commenting on that effort. To address these practical issues, data governance technologies have a large role to play in democratizing data trusts safely and in a trustworthy manner. Yet technology is far from a silver bullet. It is dangerous to rely upon it. But technology that is no-code, flexible, and secure can help more responsibly operate data trusts. This type of technology helps innovators put relationships at the center of their efforts….(More)”.

Charting the ‘Data for Good’ Landscape

Curated on July 15, 2021July 15, 2021 by Stefaan Verhulst

Report by Jake Porway at Data.org: “There is huge potential for data science and AI to play a productive role in advancing social impact. However, the field of “data for good” is not only overshadowed by the public conversations about the risks rampant data misuse can pose to civil society, it is also a fractured and disconnected space. There are a myriad of different interpretations of what it means to “use data for good” or “use AI for good”, which creates duplicate efforts, nonstrategic initiatives, and confusion about what a successfully data-driven social sector could look like. To add to that, funding is scarce for a field that requires expensive tools and skills to do well. These enduring challenges result in work being done at an activity and project level, but do not create a coherent set of building blocks to constitute a strong and healthy field that is capable of solving a new class of systems-level problems.

We are taking one tiny step forward in trying to make a more coherent Data for Good space with a landscape that makes clear what various Data for Good initiatives (and AI for Good initiatives) are trying to achieve, how they do it, and what makes them similar or different from one another. One of the major confusion points in talking about “Data for Good” is that it treats all efforts as similar by the mere fact that they use “data” and seek to do something “good”. This term is so broad as to be practically meaningless; as unhelpful as saying “Wood for Good”. We would laugh at a term as vague as “Wood for Good”, which would lump together activities as different as building houses to burning wood in cook stoves to making paper, combining architecture with carpentry, forestry with fuel. However, we are content to say “Data for Good”, and its related phrases “we need to use our data better” or “we need to be data-driven”, when data is arguably even more general than something like wood.

We are trying to bring clarity to the conversation by going beyond mapping organizations into arbitrary groups, to define the dimensions of what it means to do data for good. By creating an ontology for what Data for Good initiatives seek to achieve, in which sector, and by what means, we can gain a better understanding of the underlying fundamentals of using data for good, as well as creating a landscape of what initiatives are doing.

We hope that this landscape of initiatives will help to bring some more nuance and clarity to the field, as well as identify which initiatives are out there and what purpose they serve. Specifically, we hope this landscape will help:

Data for Good field practitioners align on a shared language for the outcomes, activities, and aims of the field.
Purpose-driven organizations who are interested in applying data and computing to their missions better understand what they might need and who they might go to to get it.
Funders make more strategic decisions about funding in the data/AI space based on activities that align with their interests and the amount of funding already devoted to that area.
Organizations with Data for Good initiatives can find one another and collaborate based on similarity of mission and activities.

Below you will find a very preliminary landscape map, along with a description of the different kinds of groups in the Data for Good ecosystem and why you might need to engage with them….(More)”.