The “Onion Model”: A Layered Approach to Documenting How the Third Wave of Open Data Can Provide Societal Value


Blog post by Andrew Zahuranec, Andrew Young and Stefaan Verhulst: “There’s a lot that goes into data-driven decision-making. Behind the datasets, platforms, and analysts is a complex series of processes that inform what kinds of insight data can produce and what kinds of ends it can achieve. These individual processes can be hard to understand when viewed together but, by separating the stages out, we can not only track how data leads to decisions but promote better and more impactful data management.

Earlier this year, The Open Data Policy Lab published the Third Wave of Open Data Toolkit to explore the elements of data re-use. At the center of this toolkit was an abstraction that we call the Open Data Framework. Divided into individual, onion-like layers, the framework shows all the processes that go into capitalizing on data in the third wave, starting with the creation of a dataset through data collaboration, creating insights, and using those insights to produce value.

This blog tries to re-iterate what’s included in each layer of this data “onion model” and demonstrate how organizations can create societal value by making their data available for re-use by other parties….(More)”.

Innovative Data for Urban Planning: The Opportunities and Challenges of Public-Private Data Partnerships


GSMA Report: “Rapid urbanisation will be one of the most pressing and complex challenges in low-and-middle income countries (LMICs) for the next several decades. With cities in Africa and Asia expected to add more than one billion people, urban populations will represent two-thirds of the world population by 2050. This presents LMICs with an interesting opportunity and challenge, where rapid urbanisation can both contribute to economic or poverty growth.

The rapid pace and unequal character of urbanisation in LMICs has meant that not enough data has been generated to support urban planning solutions and the effective provision of urban utility services. Data-sharing partnerships between the public and private sector can bridge this data gap and open up an opportunity for governments to address urbanisation challenges with data-driven decisions. Innovative data sources such as mobile network operator data, remote sensing data, utility services data and other digital services data, can be applied to a range of critical urban planning and service provision use cases.

This report identifies challenges and enablers for public-private data-sharing partnerships (PPPs) that relate to the partnership engagement model, data and technology, regulation and ethics frameworks and evaluation and sustainability….(More)”

Remove obstacles to sharing health data with researchers outside of the European Union


Heidi Beate Bentzen et al in Nature: “International sharing of pseudonymized personal data among researchers is key to the advancement of health research and is an essential prerequisite for studies of rare diseases or subgroups of common diseases to obtain adequate statistical power.

Pseudonymized personal data are data on which identifiers such as names are replaced by codes. Research institutions keep the ‘code key’ that can link an individual person to the data securely and separately from the research data and thereby protect privacy while preserving the usefulness of data for research. Pseudonymized data are still considered personal data under the General Data Protection Regulation (GDPR) 2016/679 of the European Union (EU) and, therefore, international transfers of such data need to comply with GDPR requirements. Although the GDPR does not apply to transfers of anonymized data, the threshold for anonymity under the GDPR is very high; hence, rendering data anonymous to the level required for exemption from the GDPR can diminish the usefulness of the data for research and is often not even possible.

The GDPR requires that transfers of personal data to international organizations or countries outside the European Economic Area (EEA)—which comprises the EU Member States plus Iceland, Liechtenstein and Norway—be adequately protected. Over the past two years, it has become apparent that challenges emerge for the sharing of data with public-sector researchers in a majority of countries outside of the EEA, as only a few decisions stating that a country offers an adequate level of data protection have so far been issued by the European Commission. This is a problem, for example, with researchers at federal research institutions in the United States. Transfers to international organizations such as the World Health Organization are similarly affected. Because these obstacles ultimately affect patients as beneficiaries of research, solutions are urgently needed. The European scientific academies have recently published a report explaining the consequences of stalled data transfers and pushing for responsible solutions…(More)”.

Designing data collaboratives to better understand human mobility and migration in West Africa



“The Big Data for Migration Alliance (BD4M) is released the report, “Designing Data Collaboratives to Better Understand Human Mobility and Migration in West Africa,” providing findings from a first-of-its-kind rapid co-design and prototyping workshop, or “Studio.” The first BD4M Studio convened over 40 stakeholders in government, international organizations, research, civil society, and the public sector to develop concrete strategies for developing and implementing cross- sectoral data partnerships, or “data collaboratives,” to improve ethical and secure access to data for migration-related policymaking and research in West Africa.

BD4M is an effort spearheaded by the International Organization for Migration’s Global Migration Data Analysis Centre (IOM GMDAC), European Commission’s Joint Research Centre (JRC), and The GovLab to accelerate the responsible and ethical use of novel data sources and methodologies—such as social media, mobile phone data, satellite imagery, artificial intelligence—to support migration-related programming and policy on the global, national, and local levels. 

The BD4M Studio was informed by The Migration Domain of The 100 Questions Initiative — a global agenda-setting exercise to define the most impactful questions related to migration that could be answered through data collaboration. Inspired by the outputs of The 100 Questions, Studio participants designed data collaboratives that could produce answers to three key questions: 

  1. How can data be used to estimate current cross-border migration and mobility by sex and age in West Africa?
  2.  How can data be used to assess the current state of diaspora communities and their migration behavior in the region?
  3. How can we use data to better understand the drivers of migration in West Africa?…(More)”

Developing a Responsible and Well-designed Governance Structure for Data Marketplaces


WEF Briefing Paper: “… extracts insights from the discussions with thought leaders and experts to serve as a point of departure for governments and other members of the global community to discuss governance structures and regulatory frameworks for Data Marketplace Service Providers (DMSPs), the primary operators and managers of data exchanges as trusted third parties, in data marketplaces and exchanges in a wide range of jurisdictions. As decision-makers globally develop data marketplace solutions specific to their unique cultural nuances and needs, this paper provides insights into key governance issues to get right and do so with global interoperability and adaptability in mind….(More)”.

Who will benefit from big data? Farmers’ perspective on willingness to share farm data


Paper by Airong Zhang et al : “Agricultural industries are facing a dual challenge of increasing production to meet the growing population with a disruptive changing climate and, at the same time, reducing its environmental impacts. Digital agriculture supported by big data technology has been regarded as a solution to address such challenges. However, realising the potential value promised by big data technology depends upon farm-level data generated by digital agriculture being aggregated at scale. Yet, there is limited understanding of farmers’ willingness to contribute agricultural data for analysis and how that willingness could be affected by their perceived beneficiary of the aggregated data.

The present study aimed to investigate farmers’ perspective on who would benefit the most from the aggregated agricultural data, and their willingness to share their input and output farm data with a range of agricultural sector stakeholders (i.e. other farmers, industry and government statistical organisations, technology businesses, and research institutions). To do this, we conducted a computer-assisted telephone interview with 880 Australian farmers from broadacre agricultural sectors. The results show that only 34 % of participants regarded farmers as the primary beneficiary of aggregated agricultural data, followed by agribusiness (35 %) and government (21 %) as the main beneficiary. The participants’ willingness to share data was mostly positive. However, the level of willingness fluctuated depending on who was perceived as the primary beneficiary and with which stakeholder the data would be shared. While participants reported concerns over aggregated farm data being misused and privacy of own farm data, perception of farmers being the primary beneficiary led to the lowest levels of concerns. The findings highlight that, to seize the opportunities of sustainable agriculture through applying big data technologies, significant value propositions for farmers need to be created to provide a reason for farmers to share data, and a higher level of trust between farmers and stakeholders, especially technology and service providers, needs to be established….(More)”.

Mapping Africa’s Buildings with Satellite Imagery


Google AI Blog: “An accurate record of building footprints is important for a range of applications, from population estimation and urban planning to humanitarian response and environmental science. After a disaster, such as a flood or an earthquake, authorities need to estimate how many households have been affected. Ideally there would be up-to-date census information for this, but in practice such records may be out of date or unavailable. Instead, data on the locations and density of buildings can be a valuable alternative source of information.

A good way to collect such data is through satellite imagery, which can map the distribution of buildings across the world, particularly in areas that are isolated or difficult to access. However, detecting buildings with computer vision methods in some environments can be a challenging task. Because satellite imaging involves photographing the earth from several hundred kilometres above the ground, even at high resolution (30–50 cm per pixel), a small building or tent shelter occupies only a few pixels. The task is even more difficult for informal settlements, or rural areas where buildings constructed with natural materials can visually blend into the surroundings. There are also many types of natural and artificial features that can be easily confused with buildings in overhead imagery.

In “Continental-Scale Building Detection from High-Resolution Satellite Imagery”, we address these challenges, using new methods for detecting buildings that work in rural and urban settings across different terrains, such as savannah, desert, and forest, as well as informal settlements and refugee facilities. We use this building detection model to create the Open Buildings dataset, a new open-access data resource containing the locations and footprints of 516 million buildings with coverage across most of the African continent. The dataset will support several practical, scientific and humanitarian applications, ranging from disaster response or population mapping to planning services such as new medical facilities or studying human impact on the natural environment….(More)”.

Principled Data Access: Building Public-private Data Partnerships for Better Official Statistics


Paper by Claudia Biancotti, Oscar Borgogno and Giovanni Veronese: “Official statistics serve as an important compass for policymakers due to their quality, impartiality, and transparency. In the current post-pandemic environment of great uncertainty and widespread disinformation, they need to serve this purpose more than ever. The wealth of data produced by the digital society (e.g. from user activity on online platforms or from Internet-of-Things devices) could help official statisticians improve the salience, timeliness and depth of their output. This data, however, tends to be locked away within the private sector. We argue that this should change and we propose a set of principles under which the public and the private sector can form partnerships to leverage the potential of new-generation data in the public interest. The principles, compatible with a variety of legal frameworks, aim at establishing trust between data collectors, data subjects, and statistical authorities, while also ensuring the technical usability of the data and the sustainability of partnerships over time. They are driven by a logic of incentive compatibility and burden sharing….(More)”

Using Satellite Imagery and Deep Learning to Evaluate the Impact of Anti-Poverty Programs


Paper by Luna Yue Huang, Solomon M. Hsiang & Marco Gonzalez-Navarro: “The rigorous evaluation of anti-poverty programs is key to the fight against global poverty. Traditional approaches rely heavily on repeated in-person field surveys to measure program effects. However, this is costly, time-consuming, and often logistically challenging. Here we provide the first evidence that we can conduct such program evaluations based solely on high-resolution satellite imagery and deep learning methods. Our application estimates changes in household welfare in a recent anti-poverty program in rural Kenya. Leveraging a large literature documenting a reliable relationship between housing quality and household wealth, we infer changes in household wealth based on satellite-derived changes in housing quality and obtain consistent results with the traditional field-survey based approach. Our approach generates inexpensive and timely insights on program effectiveness in international development programs…(More)”.

Machine Learning and Mobile Phone Data Can Improve the Targeting of Humanitarian Assistance


Paper by Emily Aiken et al: “The COVID-19 pandemic has devastated many low- and middle-income countries (LMICs), causing widespread food insecurity and a sharp decline in living standards. In response to this crisis, governments and humanitarian organizations worldwide have mobilized targeted social assistance programs. Targeting is a central challenge in the administration of these programs: given available data, how does one rapidly identify the individuals and families with the greatest need? This challenge is particularly acute in the large number of LMICs that lack recent and comprehensive data on household income and wealth.

Here we show that non-traditional “big” data from satellites and mobile phone networks can improve the targeting of anti-poverty programs. Our approach uses traditional survey-based measures of consumption and wealth to train machine learning algorithms that recognize patterns of poverty in non-traditional data; the trained algorithms are then used to prioritize aid to the poorest regions and mobile subscribers. We evaluate this approach by studying Novissi, Togo’s flagship emergency cash transfer program, which used these algorithms to determine eligibility for a rural assistance program that disbursed millions of dollars in COVID-19 relief aid. Our analysis compares outcomes – including exclusion errors, total social welfare, and measures of fairness – under different targeting regimes. Relative to the geographic targeting options considered by the Government of Togo at the time, the machine learning approach reduces errors of exclusion by 4-21%. Relative to methods that require a comprehensive social registry (a hypothetical exercise; no such registry exists in Togo), the machine learning approach increases exclusion errors by 9-35%. These results highlight the potential for new data sources to contribute to humanitarian response efforts, particularly in crisis settings when traditional data are missing or out of date….(More)”.