Podcast by Kenneth Cukier: “Access to the right data can be as valuable in humanitarian crises as water or medical care, but it can also be dangerous. Misused or in the wrong hands, the same information can put already vulnerable people at further risk. Kenneth Cukier hosts this special edition of Babbage examining how humanitarian organisations use data and what they can learn from the profit-making tech industry. This episode was recorded live from Wilton Park, in collaboration with the United Nations OCHA Centre for Humanitarian Data…(More)”.
Data Collaboration for the Common Good: Enabling Trust and Innovation Through Public-Private Partnerships
World Economic Forum Report: “As the digital technologies of the Fourth Industrial Revolution continue to drive change throughout all sectors of the global economy, a unique moment exists to create a more inclusive, innovative and resilient society. Central to this change is the use of data. It is abundantly available but if improperly used will be the source of dangerous and unwelcome results.
When data is shared, linked and combined across sectoral and institutional boundaries, a multiplier effect occurs. Connecting one bit with another unlocks new insights and understandings that often weren’t anticipated. Yet, due to commercial limits and liabilities, the full value of data is often unrealized. This is particularly true when it comes to using data for the common good. While public-private data collaborations represent an unprecedented opportunity to address some of the world’s most urgent and complex challenges, they have generally been small and limited in impact. An entangled set of legal, technical, social, ethical and commercial risks have created an environment where the incentives for innovation have stalled. Additionally, the widening lack of trust among individuals and institutions creates even more uncertainty. After nearly a decade of anticipation on the promise of public-private data collaboration – with relatively few examples of success at global scale – a pivotal moment has arrived to encourage progress and move forward….(More)”
(See also http://datacollaboratives.org/).
Data Trusts, Health Data, and the Professionalization of Data Management
Paper by Keith Porcaro: “This paper explores how trusts can provide a legal model for professionalizing health data management. Data is potential. Over time, data collected for one purpose can support others. Clinical records at a hospital, created to manage a patient’s care, can be internally analyzed to identify opportunities for process and safety improvements at a hospital, or externally analyzed with other records to identify optimal treatment patterns. Data also carries the potential for harm. Personal data can be leaked or exposed. Proprietary models can be used to discriminate against patients, or price them out of care.
As novel uses of data proliferate, an individual data holder may be ill-equipped to manage complex new data relationships in a way that maximizes value and minimizes harm. A single organization may be limited by management capacity or risk tolerance. Organizations across sectors have digitized unevenly or late, and may not have mature data controls and policies. Collaborations that involve multiple organizations may face coordination problems, or disputes over ownership.
Data management is still a relatively young field. Most models of external data-sharing are based on literally transferring data—copying data between organizations, or pooling large datasets together under the control of a third party—rather than facilitating external queries of a closely held dataset.
Few models to date have focused on the professional management of data on behalf of a data holder, where the data holder retains control over not only their data, but the inferences derived from their data. Trusts can help facilitate the professionalization of data management. Inspired by the popularity of trusts for managing financial investments, this paper argues that data trusts are well-suited as a vehicle for open-ended professional management of data, where a manager’s discretion is constrained by fiduciary duties and a trust document that defines the data holder’s goals…(More)”.
We’ll soon know the exact air pollution from every power plant in the world. That’s huge.
David Roberts at Vox: “A nonprofit artificial intelligence firm called WattTime is going to use satellite imagery to precisely track the air pollution (including carbon emissions) coming out of every single power plant in the world, in real time. And it’s going to make the data public.
This is a very big deal. Poor monitoring and gaming of emissions data have made it difficult to enforce pollution restrictions on power plants. This system promises to effectively eliminate poor monitoring and gaming of emissions data….
The plan is to use data from satellites that make theirs publicly available (like the European Union’s Copernicus network and the US Landsat network), as well as data from a few private companies that charge for their data (like Digital Globe). The data will come from a variety of sensors operating at different wavelengths, including thermal infrared that can detect heat.
The images will be processed by various algorithms to detect signs of emissions. It has already been demonstrated that a great deal of pollution can be tracked simply through identifying visible smoke. WattTime says it can also use infrared imaging to identify heat from smokestack plumes or cooling-water discharge. Sensors that can directly track NO2 emissions are in development, according to WattTime executive director Gavin McCormick.
Between visible smoke, heat, and NO2, WattTime will be able to derive exact, real-time emissions information, including information on carbon emissions, for every power plant in the world. (McCormick says the data may also be used to derive information about water pollutants like nitrates or mercury.)
Google.org, Google’s philanthropic wing, is getting the project off the ground (pardon the pun) with a $1.7 million grant; it was selected through the Google AI Impact Challenge….(More)”.
The EU Wants to Build One of the World’s Largest Biometric Databases. What Could Possibly Go Wrong?
Grace Dobush at Fortune: “China and India have built the world’s largest biometric databases, but the European Union is about to join the club.
The Common Identity Repository (CIR) will consolidate biometric data on almost all visitors and migrants to the bloc, as well as some EU citizens—connecting existing criminal, asylum, and migration databases and integrating new ones. It has the potential to affect hundreds of millions of people.
The plan for the database, first proposed in 2016 and approved by the EU Parliament on April 16, was sold as a way to better track and monitor terrorists, criminals, and unauthorized immigrants.
The system will target the fingerprints and identity data for visitors and immigrants initially, and represents the first step towards building a truly EU-wide citizen database. At the same time, though, critics argue its mere existence will increase the potential for hacks, leaks, and law enforcement abuse of the information….
The European Parliament and the European Council have promised to address those concerns, through “proper safeguards” to protect personal privacy and to regulate officers’ access to data. In 2016, they passed a law regarding law enforcement’s access to personal data, alongside General Data Protection Regulation or GDPR.
But total security is a tall order. Germany is currently dealing with multipleinstances of police officers allegedly leaking personal information to far-right groups. Meanwhile, a Swedish hacker went to prison for hacking into Denmark’s public records system in 2012 and dumping online the personal data of hundreds of thousands of citizens and migrants….(More)”.
Facebook will open its data up to academics to see how it impacts elections
MIT Technology Review: “More than 60 researchers from 30 institutions will get access to Facebook user data to study its impact on elections and democracy, and how it’s used by advertisers and publishers.
A vast trove: Facebook will let academics see which websites its users linked to from January 2017 to February 2019. Notably, that means they won’t be able to look at the platform’s impact on the US presidential election in 2016, or on the Brexit referendum in the UK in the same year.
Despite this slightly glaring omission, it’s still hard to wrap your head around the scale of the data that will be shared, given that Facebook is used by 1.6 billion people every day. That’s more people than live in all of China, the most populous country on Earth. It will be one of the largest data sets on human behavior online to ever be released.
The process: Facebook didn’t pick the researchers. They were chosen by the Social Science Research Council, a US nonprofit. Facebook has been working on this project for over a year, as it tries to balance research interests against user privacy and confidentiality.
Privacy: In a blog post, Facebook said it will use a number of statistical techniques to make sure the data set can’t be used to identify individuals. Researchers will be able to access it only via a secure portal that uses a VPN and two-factor authentication, and there will be limits on the number of queries they can each run….(More)”.
Nowcasting the Local Economy: Using Yelp Data to Measure Economic Activity
Paper by Edward L. Glaeser, Hyunjin Kim and Michael Luca: “Can new data sources from online platforms help to measure local economic activity? Government datasets from agencies such as the U.S. Census Bureau provide the standard measures of local economic activity at the local level. However, these statistics typically appear only after multi-year lags, and the public-facing versions are aggregated to the county or ZIP code level. In contrast, crowdsourced data from online platforms such as Yelp are often contemporaneous and geographically finer than official government statistics. Glaeser, Kim, and Luca present evidence that Yelp data can complement government surveys by measuring economic activity in close to real time, at a granular level, and at almost any geographic scale. Changes in the number of businesses and restaurants reviewed on Yelp can predict changes in the number of overall establishments and restaurants in County Business Patterns. An algorithm using contemporaneous and lagged Yelp data can explain 29.2 percent of the residual variance after accounting for lagged CBP data, in a testing sample not used to generate the algorithm. The algorithm is more accurate for denser, wealthier, and more educated ZIP codes….(More)”.
See all papers presented at the NBER Conference on Big Data for 21st Century Economic Statistics here.
A weather tech startup wants to do forecasts based on cell phone signals
Douglas Heaven at MIT Technology Review: “On 14 April more snow fell on Chicago than it had in nearly 40 years. Weather services didn’t see it coming: they forecast one or two inches at worst. But when the late winter snowstorm came it caused widespread disruption, dumping enough snow that airlines had to cancel more than 700 flights across all of the city’s airports.
One airline did better than most, however. Instead of relying on the usual weather forecasts, it listened to ClimaCell – a Boston-based “weather tech” start-up that claims it can predict the weather more accurately than anyone else. According to the company, its correct forecast of the severity of the coming snowstorm allowed the airline to better manage its schedules and minimize losses due to delays and diversions.
Founded in 2015, ClimaCell has spent the last few years developing the technology and business relationships that allow it to tap into millions of signals from cell phones and other wireless devices around the world. It uses the quality of these signals as a proxy for local weather conditions, such as precipitation and air quality. It also analyzes images from street cameras. It is offering a weather forecasting service to subscribers that it claims is 60 percent more accurate than that of existing providers, such as NOAA.
The internet of weather
The approach makes sense, in principle. Other forecasters use proxies, such as radar signals. But by using information from millions of everyday wireless devices, ClimaCell claims it has a far more fine-grained view of most of the globe than other forecasters get from the existing network of weather sensors, which range from ground-based devices to satellites. (ClimaCell also taps into these, too.)…(More)”.
Introducing the Contractual Wheel of Data Collaboration
Blog by Andrew Young and Stefaan Verhulst: “Earlier this year we launched the Contracts for Data Collaboration (C4DC) initiative — an open collaborative with charter members from The GovLab, UN SDSN Thematic Research Network on Data and Statistics (TReNDS), University of Washington and the World Economic Forum. C4DC seeks to address the inefficiencies of developing contractual agreements for public-private data collaboration by informing and guiding those seeking to establish a data collaborative by developing and making available a shared repository of relevant contractual clauses taken from existing legal agreements. Today TReNDS published “Partnerships Founded on Trust,” a brief capturing some initial findings from the C4DC initiative.
The Contractual Wheel of Data Collaboration [beta]

As part of the C4DC effort, and to support Data Stewards in the private sector and decision-makers in the public and civil sectors seeking to establish Data Collaboratives, The GovLab developed the Contractual Wheel of Data Collaboration [beta]. The Wheel seeks to capture key elements involved in data collaboration while demystifying contracts and moving beyond the type of legalese that can create confusion and barriers to experimentation.
The Wheel was developed based on an assessment of existing legal agreements, engagement with The GovLab-facilitated Data Stewards Network, and analysis of the key elements of our Data Collaboratives Methodology. It features 22 legal considerations organized across 6 operational categories that can act as a checklist for the development of a legal agreement between parties participating in a Data Collaborative:…(More)”.
San Francisco teams up with Uber, location tracker on 911 call responses
Gwendolyn Wu at San Francisco Chronicle: “In an effort to shorten emergency response times in San Francisco, the city announced on Monday that it is now using location data from RapidSOS, a New York-based public safety tech company, and ride-hailing company Uber to improve location coordinates generated from 911 calls.
An increasing amount of emergency calls are made from cell phones, said Michelle Cahn, RapidSOS’s director of community engagement. The new technology should allow emergency responders to narrow down the location of such callers and replace existing 911 technology that was built for landlines and tied to home addresses.
Cell phone location data currently given to dispatchers when they receive a 911 call can be vague, especially if the person can’t articulate their exact location, according to the Department of Emergency Management.
But if a dispatcher can narrow down where the emergency is happening, that increases the chance of a timely response and better result, Cahn said.
“It doesn’t matter what’s going on with the emergency if we don’t know where it is,” she said.
RapidSOS shares its location data — collected by Apple and Google for their in-house map apps — free of charge to public safety agencies. San Francisco’s 911 call center adopted the data service in September 2018.
The Federal Communications Commission estimates agencies could save as many as 10,000 lives a year if they shave a minute off response times. Federal officials issued new rules to improve wireless 911 calls in 2015, asking mobile carriers to provide more accurate locations to call centers. Carriers are required to find a way to triangulate the caller’s location within 50 meters — a much smaller radius than the eight blocks city officials were initially presented in October when the caller dialed 911…(More)”.