Data Re-Use and Collaboration for Development


Stefaan G. Verhulst at Data & Policy: “It is often pointed out that we live in an era of unprecedented data, and that data holds great promise for development. Yet equally often overlooked is the fact that, as in so many domains, there exist tremendous inequalities and asymmetries in where this data is generated, and how it is accessed. The gap that separates high-income from low-income countries is among the most important (or at least most persistent) of these asymmetries…

Data collaboratives are an emerging form of public-private partnership that, when designed responsibly, can offer a potentially innovative solution to this problem. Data collaboratives offer at least three key benefits for developing countries:

1. Cost Efficiencies: Data and data analytic capacity are often hugely expensive and beyond the limited capacities of many low-income countries. Data reuse, facilitated by data collaboratives, can bring down the cost of data initiatives for development projects.

2. Fresh insights for better policy: Combining data from various sources by breaking down silos has the potential to lead to new and innovative insights that can help policy makers make better decisions. Digital data can also be triangulated with existing, more traditional sources of information (e.g., census data) to generate new insights and help verify the accuracy of information.

3. Overcoming inequalities and asymmetries: Social and economic inequalities, both within and among countries, are often mapped onto data inequalities. Data collaboratives can help ease some of these inequalities and asymmetries, for example by allowing costs and analytical tools and techniques to be pooled. Cloud computing, which allows information and technical tools to be easily shared and accessed, are an important example. They can play a vital role in enabling the transfer of skills and technologies between low-income and high-income countries…(More)”. See also: Reusing data responsibly to achieve development goals (OECD Report).

Making data for good better


Article by Caroline Buckee, Satchit Balsari, and Andrew Schroeder: “…Despite the long standing excitement about the potential for digital tools, Big Data and AI to transform our lives, these innovations–with some exceptions–have so far had little impact on the greatest public health emergency of our time.

Attempts to use digital data streams to rapidly produce public health insights that were not only relevant for local contexts in cities and countries around the world, but also available to decision makers who needed them, exposed enormous gaps across the translational pipeline. The insights from novel data streams which could help drive precise, impactful health programs, and bring effective aid to communities, found limited use among public health and emergency response systems. We share here our experience from the COVID-19 Mobility Data Network (CMDN), now Crisis Ready (crisisready.io), a global collaboration of researchers, mostly infectious disease epidemiologists and data scientists, who served as trusted intermediaries between technology companies willing to share vast amounts of digital data, and policy makers, struggling to incorporate insights from these novel data streams into their decision making. Through our experience with the Network, and using human mobility data as an illustrative example, we recognize three sets of barriers to the successful application of large digital datasets for public good.

First, in the absence of pre-established working relationships with technology companies and data brokers, the data remain primarily confined within private circuits of ownership and control. During the pandemic, data sharing agreements between large technology companies and researchers were hastily cobbled together, often without the right kind of domain expertise in the mix. Second, the lack of standardization, interoperability and information on the uncertainty and biases associated with these data, necessitated complex analytical processing by highly specialized domain experts. And finally, local public health departments, understandably unfamiliar with these novel data streams, had neither the bandwidth nor the expertise to sift noise from signal. Ultimately, most efforts did not yield consistently useful information for decision making, particularly in low resource settings, where capacity limitations in the public sector are most acute…(More)”.

Trove of unique health data sets could help AI predict medical conditions earlier


Madhumita Murgia at the Financial Times: “…Ziad Obermeyer, a physician and machine learning scientist at the University of California, Berkeley, launched Nightingale Open Science last month — a treasure trove of unique medical data sets, each curated around an unsolved medical mystery that artificial intelligence could help to solve.

The data sets, released after the project received $2m of funding from former Google chief executive Eric Schmidt, could help to train computer algorithms to predict medical conditions earlier, triage better and save lives.

The data include 40 terabytes of medical imagery, such as X-rays, electrocardiogram waveforms and pathology specimens, from patients with a range of conditions, including high-risk breast cancer, sudden cardiac arrest, fractures and Covid-19. Each image is labelled with the patient’s medical outcomes, such as the stage of breast cancer and whether it resulted in death, or whether a Covid patient needed a ventilator.

Obermeyer has made the data sets free to use and mainly worked with hospitals in the US and Taiwan to build them over two years. He plans to expand this to Kenya and Lebanon in the coming months to reflect as much medical diversity as possible.

“Nothing exists like it,” said Obermeyer, who announced the new project in December alongside colleagues at NeurIPS, the global academic conference for artificial intelligence. “What sets this apart from anything available online is the data sets are labelled with the ‘ground truth’, which means with what really happened to a patient and not just a doctor’s opinion.”…

The Nightingale data sets were among dozens proposed this year at NeurIPS.

Other projects included a speech data set of Mandarin and eight subdialects recorded by 27,000 speakers in 34 cities in China; the largest audio data set of Covid respiratory sounds, such as breathing, coughing and voice recordings, from more than 36,000 participants to help screen for the disease; and a data set of satellite images covering the entire country of South Africa from 2006 to 2017, divided and labelled by neighbourhood, to study the social effects of spatial apartheid.

Elaine Nsoesie, a computational epidemiologist at the Boston University School of Public Health, said new types of data could also help with studying the spread of diseases in diverse locations, as people from different cultures react differently to illnesses.

She said her grandmother in Cameroon, for example, might think differently than Americans do about health. “If someone had an influenza-like illness in Cameroon, they may be looking for traditional, herbal treatments or home remedies, compared to drugs or different home remedies in the US.”

Computer scientists Serena Yeung and Joaquin Vanschoren, who proposed that research to build new data sets should be exchanged at NeurIPS, pointed out that the vast majority of the AI community still cannot find good data sets to evaluate their algorithms. This meant that AI researchers were still turning to data that were potentially “plagued with bias”, they said. “There are no good models without good data.”…(More)”.

Cities and the Climate-Data Gap


Article by Robert Muggah and Carlo Ratti: “With cities facing disastrous climate stresses and shocks in the coming years, one would think they would be rushing to implement mitigation and adaptation strategies. Yet most urban residents are only dimly aware of the risks, because their cities’ mayors, managers, and councils are not collecting or analyzing the right kinds of information.

With more governments adopting strategies to reduce greenhouse-gas (GHG) emissions, cities everywhere need to get better at collecting and interpreting climate data. More than 11,000 cities have already signed up to a global covenant to tackle climate change and manage the transition to clean energy, and many aim to achieve net-zero emissions before their national counterparts do. Yet virtually all of them still lack the basic tools for measuring progress.

Closing this gap has become urgent, because climate change is already disrupting cities around the world. Cities on almost every continent are being ravaged by heat waves, fires, typhoons, and hurricanes. Coastal cities are being battered by severe flooding connected to sea-level rise. And some megacities and their sprawling peripheries are being reconsidered altogether, as in the case of Indonesia’s $34 billion plan to move its capital from Jakarta to Borneo by 2024.

Worse, while many subnational governments are setting ambitious new green targets, over 40% of cities (home to some 400 million people) still have no meaningful climate-preparedness strategy. And this share is even lower in Africa and Asia – where an estimated 90% of all future urbanization in the next three decades is expected to occur.

We know that climate-preparedness plans are closely correlated with investment in climate action including nature-based solutions and systematic resilience. But strategies alone are not enough. We also need to scale up data-driven monitoring platforms. Powered by satellites and sensors, these systems can track temperatures inside and outside buildings, alert city dwellers to air-quality issues, and provide high-resolution information on concentrations of specific GHGs (carbon dioxide and nitrogen dioxide) and particulate matter…(More)”.

‘In Situ’ Data Rights


Essay by Marshall W Van Alstyne, Georgios Petropoulos, Geoffrey Parker, and Bertin Martens: “…Data portability sounds good in theory—number portability improved telephony—but this theory has its flaws.

  • Context: The value of data depends on context. Removing data from that context removes value. A portability exercise by experts at the ProgrammableWeb succeeded in downloading basic Facebook data but failed on a re-upload.1 Individual posts shed the prompts that preceded them and the replies that followed them. After all, that data concerns others.
  • Stagnation: Without a flow of updates, a captured stock depreciates. Data must be refreshed to stay current, and potential users must see those data updates to stay informed.
  • Impotence: Facts removed from their place of residence become less actionable. We cannot use them to make a purchase when removed from their markets or reach a friend when they are removed from their social networks. Data must be reconnected to be reanimated.
  • Market Failure. Innovation is slowed. Consider how markets for business analytics and B2B services develop. Lacking complete context, third parties can only offer incomplete benchmarking and analysis. Platforms that do offer market overview services can charge monopoly prices because they have context that partners and competitors do not.
  • Moral Hazard: Proposed laws seek to give merchants data portability rights but these entail a problem that competition authorities have not anticipated. Regulators seek to help merchants “multihome,” to affiliate with more than one platform. Merchants can take their earned ratings from one platform to another and foster competition. But, when a merchant gains control over its ratings data, magically, low reviews can disappear! Consumers fraudulently edited their personal records under early U.K. open banking rules. With data editing capability, either side can increase fraud, surely not the goal of data portability.

Evidence suggests that following GDPR, E.U. ad effectiveness fell, E.U. Web revenues fell, investment in E.U. startups fell, the stock and flow of apps available in the E.U. fell, while Google and Facebook, who already had user data, gained rather than lost market share as small firms faced new hurdles the incumbents managed to avoid. To date, the results are far from regulators’ intentions.

We propose a new in situ data right for individuals and firms, and a new theory of benefits. Rather than take data from the platform, or ex situ as portability implies, let us grant users the right to use their data in the location where it resides. Bring the algorithms to the data instead of bringing the data to the algorithms. Users determine when and under what conditions third parties access their in situ data in exchange for new kinds of benefits. Users can revoke access at any time and third parties must respect that. This patches and repairs the portability problems…(More).”

Biases in human mobility data impact epidemic modeling


Paper by Frank Schlosser, Vedran Sekara, Dirk Brockmann, and Manuel Garcia-Herranz: “Large-scale human mobility data is a key resource in data-driven policy making and across many scientific fields. Most recently, mobility data was extensively used during the COVID-19 pandemic to study the effects of governmental policies and to inform epidemic models. Large-scale mobility is often measured using digital tools such as mobile phones. However, it remains an open question how truthfully these digital proxies represent the actual travel behavior of the general population. Here, we examine mobility datasets from multiple countries and identify two fundamentally different types of bias caused by unequal access to, and unequal usage of mobile phones. We introduce the concept of data generation bias, a previously overlooked type of bias, which is present when the amount of data that an individual produces influences their representation in the dataset. We find evidence for data generation bias in all examined datasets in that high-wealth individuals are overrepresented, with the richest 20% contributing over 50% of all recorded trips, substantially skewing the datasets. This inequality is consequential, as we find mobility patterns of different wealth groups to be structurally different, where the mobility networks of high-wealth users are denser and contain more long-range connections. To mitigate the skew, we present a framework to debias data and show how simple techniques can be used to increase representativeness. Using our approach we show how biases can severely impact outcomes of dynamic processes such as epidemic simulations, where biased data incorrectly estimates the severity and speed of disease transmission. Overall, we show that a failure to account for biases can have detrimental effects on the results of studies and urge researchers and practitioners to account for data-fairness in all future studies of human mobility…(More)”.

Expanding Mobility: The Power of Linked Administrative Data and Integrated Data Systems


Brief by Della Jenkins and Emily Berkowitz: “This brief describes how linking administrative data can expand traditional measures of mobility for research and action, provides examples of the types of economic mobility research questions that are only answerable using linked administrative data, and describes how analysis can be deepened using spatial and multi-generational perspectives. In addition, we discuss how the field of economic mobility research benefits when state and local governments are resourced to build systems that enable routine reuse of linked data. Finally, we end with a summary of the opportunities that exist to build on data capacity already developed by state and local governments across the US to better understand the policies that support pathways out of poverty. Now more than ever, governments, research partners, and stakeholders can come together to make use of the data already collected by social service programs to generate evidence-based approaches to expanding mobility…(More)”

The argument against property rights in data


Report by Open Future: “25 years after the adoption of the Database Directive, there is mounting evidence that the introduction of the sui generis right did not lead to increased data access and use–instead, an additional intellectual property layer became one more obstacle.

Today, the European Commission, as it drafts the new Data Act, faces a fundamental choice both regarding the existing sui generis database rights and the introduction of a similar right to raw, machine-generated data. There is a risk that an approach that treats data as property will be further strengthened through a new data producer’s right. The idea of such a new exclusive right was introduced by the European Commission in 2017. This proposed right was to be based on the same template as the sui generis database right. 

A new property right will not secure the goals defined in the European data strategy: those of ensuring access and use of data, in a data economy built around common data spaces. Instead, they will strengthen existing monopolies in the data economy. 

Instead of introducing new property rights, greater access to and use of data should be achieved by introducing–in the Data Act, and in other currently debated legal acts–access rights that treat data as a commons. 

In this policy brief, we present the current policy debate on access and use of data, as well as the history of proposals for property rights in data – including the sui generis database right. We present arguments against the introduction of new property rights, and in favor of strengthening data access rights….(More)”.

Using social media data to ‘nowcast’ migration around the globe


Report by RAND: “In recent years, unprecedented waves of refugees, economic migrants and people displaced by a variety of factors have made migration a high-priority policy issue around the world. Despite this, official migration statistics often come with a time lag and can fail to correctly capture the full extent of migration, leaving decision makers without timely and robust data to make informed policy decisions.

In a RAND-initiated, self-funded research study, we developed a methodological tool to compute near real-time migration estimates for European Union member states and the United States. The tool, underpinned by a Bayesian model, is capable of providing ‘nowcasts’ of migrant stocks by combining real-time data from the Facebook Marketing Application Programming Interface and data from official migration sources, such as Eurostat and the US Census Bureau.

These nowcasts can serve as an early-warning system to anticipate ‘shock events’ and rapid migration trends that would otherwise be captured too late or not at all by official migration data sources. The tool could therefore enable decision makers to make informed, evidence-based policy decisions in the rapidly changing social policy sphere of international migration.

The study also provides a useful example of how to combine ‘big data’ with traditional data to improve measurement and estimation which can be applied to other social and demographic phenomena…(More)”.

Strengthening CRVS Systems to Improve Migration Policy: A Promising Innovation


Blog by Tawheeda Wahabzada and Deirdre Appel: “Migration is one of the most pressing issues of our time and innovation for migration policy can take on several different shapes to help solve challenges. It is seen through radical technological breakthrough such as biometric identifiers that completely transform the status quo as well as technological disruptions like mobile phone fund transforms that alter an existing process. There is also incremental innovation, or the gradual improvement of an existing process or institution even. Regardless of where the fall on the spectrum, their innovative applications are all relevant to migration policy.

Incremental innovation for civil registration and vital statistics (CRVS) systems can greatly benefit migrants and the policymakers trying to help them. According to World Health Organization, a well-functioning CRVS system registers all births and deaths, issues birth and death certificates, and compiles and disseminates vital statistics, including cause of death information. It may also record marriages and divorces. Each of these services brings a world of crucial advantages. But despite the social and legal benefits for individuals, especially migrants, these systems remain underfunded and under functioning. More than 100 low and middle-income countries lack functional CRVS systems and about one-third of all births are not registered. This amounts to more than one billion people without a legal identity leaving them unable to prove who they are and creating serious barriers to access health, education, financial, and other social services.

Throughout countries in Africa, there are great differences in CRVS coverage, where birth coverage ranges from above 90 percent in some North African countries to under 50 percent across several countries in different regions; and with death registration having greater gaps with either no information or lower coverage rates. For countries with low functioning CRVS systems, potential migrants from these countries could face additional obstacles in obtaining birth certificates and proof of identification….(More)”. See also https://data4migration.org/blog/