Trove of unique health data sets could help AI predict medical conditions earlier


Madhumita Murgia at the Financial Times: “…Ziad Obermeyer, a physician and machine learning scientist at the University of California, Berkeley, launched Nightingale Open Science last month — a treasure trove of unique medical data sets, each curated around an unsolved medical mystery that artificial intelligence could help to solve.

The data sets, released after the project received $2m of funding from former Google chief executive Eric Schmidt, could help to train computer algorithms to predict medical conditions earlier, triage better and save lives.

The data include 40 terabytes of medical imagery, such as X-rays, electrocardiogram waveforms and pathology specimens, from patients with a range of conditions, including high-risk breast cancer, sudden cardiac arrest, fractures and Covid-19. Each image is labelled with the patient’s medical outcomes, such as the stage of breast cancer and whether it resulted in death, or whether a Covid patient needed a ventilator.

Obermeyer has made the data sets free to use and mainly worked with hospitals in the US and Taiwan to build them over two years. He plans to expand this to Kenya and Lebanon in the coming months to reflect as much medical diversity as possible.

“Nothing exists like it,” said Obermeyer, who announced the new project in December alongside colleagues at NeurIPS, the global academic conference for artificial intelligence. “What sets this apart from anything available online is the data sets are labelled with the ‘ground truth’, which means with what really happened to a patient and not just a doctor’s opinion.”…

The Nightingale data sets were among dozens proposed this year at NeurIPS.

Other projects included a speech data set of Mandarin and eight subdialects recorded by 27,000 speakers in 34 cities in China; the largest audio data set of Covid respiratory sounds, such as breathing, coughing and voice recordings, from more than 36,000 participants to help screen for the disease; and a data set of satellite images covering the entire country of South Africa from 2006 to 2017, divided and labelled by neighbourhood, to study the social effects of spatial apartheid.

Elaine Nsoesie, a computational epidemiologist at the Boston University School of Public Health, said new types of data could also help with studying the spread of diseases in diverse locations, as people from different cultures react differently to illnesses.

She said her grandmother in Cameroon, for example, might think differently than Americans do about health. “If someone had an influenza-like illness in Cameroon, they may be looking for traditional, herbal treatments or home remedies, compared to drugs or different home remedies in the US.”

Computer scientists Serena Yeung and Joaquin Vanschoren, who proposed that research to build new data sets should be exchanged at NeurIPS, pointed out that the vast majority of the AI community still cannot find good data sets to evaluate their algorithms. This meant that AI researchers were still turning to data that were potentially “plagued with bias”, they said. “There are no good models without good data.”…(More)”.

Cities and the Climate-Data Gap


Article by Robert Muggah and Carlo Ratti: “With cities facing disastrous climate stresses and shocks in the coming years, one would think they would be rushing to implement mitigation and adaptation strategies. Yet most urban residents are only dimly aware of the risks, because their cities’ mayors, managers, and councils are not collecting or analyzing the right kinds of information.

With more governments adopting strategies to reduce greenhouse-gas (GHG) emissions, cities everywhere need to get better at collecting and interpreting climate data. More than 11,000 cities have already signed up to a global covenant to tackle climate change and manage the transition to clean energy, and many aim to achieve net-zero emissions before their national counterparts do. Yet virtually all of them still lack the basic tools for measuring progress.

Closing this gap has become urgent, because climate change is already disrupting cities around the world. Cities on almost every continent are being ravaged by heat waves, fires, typhoons, and hurricanes. Coastal cities are being battered by severe flooding connected to sea-level rise. And some megacities and their sprawling peripheries are being reconsidered altogether, as in the case of Indonesia’s $34 billion plan to move its capital from Jakarta to Borneo by 2024.

Worse, while many subnational governments are setting ambitious new green targets, over 40% of cities (home to some 400 million people) still have no meaningful climate-preparedness strategy. And this share is even lower in Africa and Asia – where an estimated 90% of all future urbanization in the next three decades is expected to occur.

We know that climate-preparedness plans are closely correlated with investment in climate action including nature-based solutions and systematic resilience. But strategies alone are not enough. We also need to scale up data-driven monitoring platforms. Powered by satellites and sensors, these systems can track temperatures inside and outside buildings, alert city dwellers to air-quality issues, and provide high-resolution information on concentrations of specific GHGs (carbon dioxide and nitrogen dioxide) and particulate matter…(More)”.

‘In Situ’ Data Rights


Essay by Marshall W Van Alstyne, Georgios Petropoulos, Geoffrey Parker, and Bertin Martens: “…Data portability sounds good in theory—number portability improved telephony—but this theory has its flaws.

  • Context: The value of data depends on context. Removing data from that context removes value. A portability exercise by experts at the ProgrammableWeb succeeded in downloading basic Facebook data but failed on a re-upload.1 Individual posts shed the prompts that preceded them and the replies that followed them. After all, that data concerns others.
  • Stagnation: Without a flow of updates, a captured stock depreciates. Data must be refreshed to stay current, and potential users must see those data updates to stay informed.
  • Impotence: Facts removed from their place of residence become less actionable. We cannot use them to make a purchase when removed from their markets or reach a friend when they are removed from their social networks. Data must be reconnected to be reanimated.
  • Market Failure. Innovation is slowed. Consider how markets for business analytics and B2B services develop. Lacking complete context, third parties can only offer incomplete benchmarking and analysis. Platforms that do offer market overview services can charge monopoly prices because they have context that partners and competitors do not.
  • Moral Hazard: Proposed laws seek to give merchants data portability rights but these entail a problem that competition authorities have not anticipated. Regulators seek to help merchants “multihome,” to affiliate with more than one platform. Merchants can take their earned ratings from one platform to another and foster competition. But, when a merchant gains control over its ratings data, magically, low reviews can disappear! Consumers fraudulently edited their personal records under early U.K. open banking rules. With data editing capability, either side can increase fraud, surely not the goal of data portability.

Evidence suggests that following GDPR, E.U. ad effectiveness fell, E.U. Web revenues fell, investment in E.U. startups fell, the stock and flow of apps available in the E.U. fell, while Google and Facebook, who already had user data, gained rather than lost market share as small firms faced new hurdles the incumbents managed to avoid. To date, the results are far from regulators’ intentions.

We propose a new in situ data right for individuals and firms, and a new theory of benefits. Rather than take data from the platform, or ex situ as portability implies, let us grant users the right to use their data in the location where it resides. Bring the algorithms to the data instead of bringing the data to the algorithms. Users determine when and under what conditions third parties access their in situ data in exchange for new kinds of benefits. Users can revoke access at any time and third parties must respect that. This patches and repairs the portability problems…(More).”

Biases in human mobility data impact epidemic modeling


Paper by Frank Schlosser, Vedran Sekara, Dirk Brockmann, and Manuel Garcia-Herranz: “Large-scale human mobility data is a key resource in data-driven policy making and across many scientific fields. Most recently, mobility data was extensively used during the COVID-19 pandemic to study the effects of governmental policies and to inform epidemic models. Large-scale mobility is often measured using digital tools such as mobile phones. However, it remains an open question how truthfully these digital proxies represent the actual travel behavior of the general population. Here, we examine mobility datasets from multiple countries and identify two fundamentally different types of bias caused by unequal access to, and unequal usage of mobile phones. We introduce the concept of data generation bias, a previously overlooked type of bias, which is present when the amount of data that an individual produces influences their representation in the dataset. We find evidence for data generation bias in all examined datasets in that high-wealth individuals are overrepresented, with the richest 20% contributing over 50% of all recorded trips, substantially skewing the datasets. This inequality is consequential, as we find mobility patterns of different wealth groups to be structurally different, where the mobility networks of high-wealth users are denser and contain more long-range connections. To mitigate the skew, we present a framework to debias data and show how simple techniques can be used to increase representativeness. Using our approach we show how biases can severely impact outcomes of dynamic processes such as epidemic simulations, where biased data incorrectly estimates the severity and speed of disease transmission. Overall, we show that a failure to account for biases can have detrimental effects on the results of studies and urge researchers and practitioners to account for data-fairness in all future studies of human mobility…(More)”.

Expanding Mobility: The Power of Linked Administrative Data and Integrated Data Systems


Brief by Della Jenkins and Emily Berkowitz: “This brief describes how linking administrative data can expand traditional measures of mobility for research and action, provides examples of the types of economic mobility research questions that are only answerable using linked administrative data, and describes how analysis can be deepened using spatial and multi-generational perspectives. In addition, we discuss how the field of economic mobility research benefits when state and local governments are resourced to build systems that enable routine reuse of linked data. Finally, we end with a summary of the opportunities that exist to build on data capacity already developed by state and local governments across the US to better understand the policies that support pathways out of poverty. Now more than ever, governments, research partners, and stakeholders can come together to make use of the data already collected by social service programs to generate evidence-based approaches to expanding mobility…(More)”

The argument against property rights in data


Report by Open Future: “25 years after the adoption of the Database Directive, there is mounting evidence that the introduction of the sui generis right did not lead to increased data access and use–instead, an additional intellectual property layer became one more obstacle.

Today, the European Commission, as it drafts the new Data Act, faces a fundamental choice both regarding the existing sui generis database rights and the introduction of a similar right to raw, machine-generated data. There is a risk that an approach that treats data as property will be further strengthened through a new data producer’s right. The idea of such a new exclusive right was introduced by the European Commission in 2017. This proposed right was to be based on the same template as the sui generis database right. 

A new property right will not secure the goals defined in the European data strategy: those of ensuring access and use of data, in a data economy built around common data spaces. Instead, they will strengthen existing monopolies in the data economy. 

Instead of introducing new property rights, greater access to and use of data should be achieved by introducing–in the Data Act, and in other currently debated legal acts–access rights that treat data as a commons. 

In this policy brief, we present the current policy debate on access and use of data, as well as the history of proposals for property rights in data – including the sui generis database right. We present arguments against the introduction of new property rights, and in favor of strengthening data access rights….(More)”.

Using social media data to ‘nowcast’ migration around the globe


Report by RAND: “In recent years, unprecedented waves of refugees, economic migrants and people displaced by a variety of factors have made migration a high-priority policy issue around the world. Despite this, official migration statistics often come with a time lag and can fail to correctly capture the full extent of migration, leaving decision makers without timely and robust data to make informed policy decisions.

In a RAND-initiated, self-funded research study, we developed a methodological tool to compute near real-time migration estimates for European Union member states and the United States. The tool, underpinned by a Bayesian model, is capable of providing ‘nowcasts’ of migrant stocks by combining real-time data from the Facebook Marketing Application Programming Interface and data from official migration sources, such as Eurostat and the US Census Bureau.

These nowcasts can serve as an early-warning system to anticipate ‘shock events’ and rapid migration trends that would otherwise be captured too late or not at all by official migration data sources. The tool could therefore enable decision makers to make informed, evidence-based policy decisions in the rapidly changing social policy sphere of international migration.

The study also provides a useful example of how to combine ‘big data’ with traditional data to improve measurement and estimation which can be applied to other social and demographic phenomena…(More)”.

Strengthening CRVS Systems to Improve Migration Policy: A Promising Innovation


Blog by Tawheeda Wahabzada and Deirdre Appel: “Migration is one of the most pressing issues of our time and innovation for migration policy can take on several different shapes to help solve challenges. It is seen through radical technological breakthrough such as biometric identifiers that completely transform the status quo as well as technological disruptions like mobile phone fund transforms that alter an existing process. There is also incremental innovation, or the gradual improvement of an existing process or institution even. Regardless of where the fall on the spectrum, their innovative applications are all relevant to migration policy.

Incremental innovation for civil registration and vital statistics (CRVS) systems can greatly benefit migrants and the policymakers trying to help them. According to World Health Organization, a well-functioning CRVS system registers all births and deaths, issues birth and death certificates, and compiles and disseminates vital statistics, including cause of death information. It may also record marriages and divorces. Each of these services brings a world of crucial advantages. But despite the social and legal benefits for individuals, especially migrants, these systems remain underfunded and under functioning. More than 100 low and middle-income countries lack functional CRVS systems and about one-third of all births are not registered. This amounts to more than one billion people without a legal identity leaving them unable to prove who they are and creating serious barriers to access health, education, financial, and other social services.

Throughout countries in Africa, there are great differences in CRVS coverage, where birth coverage ranges from above 90 percent in some North African countries to under 50 percent across several countries in different regions; and with death registration having greater gaps with either no information or lower coverage rates. For countries with low functioning CRVS systems, potential migrants from these countries could face additional obstacles in obtaining birth certificates and proof of identification….(More)”. See also https://data4migration.org/blog/

Arts Data in the Public Sector: Strategies for local arts agencies


Report by Bloomberg Associates: “Cities are increasingly using data to help shape policy and identify service gaps, but data about arts and culture is often met with skepticism. Local arts agencies, the city and county entities at the forefront of understanding and serving their local creative communities, often face difficulties in identifying meaningful metrics that capture quality as well as quantity in this unique field. With the Covid-19 pandemic and intensifying demand for equity, the desire for reliable, longitudinal information will only increase in the coming years as municipalities with severely limited resources face critical decisions in their effort toward recovery.

So how can arts-minded cities leverage data to better serve grantees, promote equity in service delivery, and demonstrate the impact of arts and culture across a range of significant policy priorities, among other ambitions?

Produced by our Cultural Assets Management team, Arts Data in the Public Sector highlights the data practices of fifteen local arts agencies across the U.S. to capture a meaningful cross-section of constituencies, resources, and strategies. Through best practices and case studies, the Guide offers useful insights and practical resources that can assist and inspire local government arts funders and advocates as they work to establish more equitable and inclusive practices and to affirm the importance of arts and culture as a public service well into the future…(More)”.

Leveraging Location and Mobility Data: Perils & Practices


Paper by Suha Mohamed: “…Mobility data refers to information (often passively captured) that provides insights into the location and movement of a population – often through their interactions with digital mobility devices (like our smartphones) or transport services. Sources of mobility data, while diverse, include call detail records from telecom companies, GPS details from phones or vehicles, geotagged social media data or first or third-party software data. 

Geolocation, a subset of mobility data, may be useful in shaping responsive courses of action as it can be leveraged in granular form to understand hyperlocal realities or, when aggregated, regional, national or international patterns. However, privacy concerns arise from the sensitive or personal data that may be inferred from these records and the often opaque conditions around its usage. The ongoing deployment of contact tracing applications, which largely depend on individual-level location data, have demonstrated extensive potential for misuse and surveillance….

Despite the surveillance and privacy concerns around the use of contact tracing apps and mobility data, it is undeniable that this data has immense public value and has helped officials understand the development of the COVID-19 virus and map its variants and waves. It has also been used to track: areas of mobility that contribute towards increased transmission of the virus, adherence to social distancing norms and the effectiveness of measures like lockdowns or restrictions….(More)”.