Airbnb and New York City Reach a Truce on Home-Sharing Data


Paris Martineau at Wired: “For much of the past decade, Airbnb and New York City have been embroiled in a high-profile feud. Airbnb wants legitimacy in its biggest market. City officials want to limit home-sharing platforms, which they argue exacerbate the city’s housing crisis and pose safety risks by allowing people to transform homes into illegal hotels.

Despite years of lawsuits, countersuits, lobbying campaigns, and failed attempts at legislation, progress on resolving the dispute has been incremental at best. The same could be said for many cities around the nation, as local government officials struggle to come to grips with the increasing popularity of short-term rental platforms like Airbnb, HomeAway, and VRBO in high-tourism areas.

In New York last week, there were two notable breaks in the logjam. On May 14, Airbnb agreed to give city officials partially anonymized host and reservation data for more than 17,000 listings. Two days later, a judge ordered Airbnb to turn over more detailed and nonanonymized information on dozens of hosts and hundreds of guests who have listed or stayed in more than a dozen buildings in Manhattan, Brooklyn, and Queens in the past seven years.

In both cases, the information will be used by investigators with the Mayor’s Office of Special Enforcement to identify hosts and property owners who may have broken the city’s notoriously strict short-term rental laws by converting residences into de facto hotels by listing them on Airbnb.

City officials originally subpoenaed Airbnb for the data—not anonymized—on the more than 17,000 listings in February. Mayor Bill de Blasio called the move an effort to force the company to “come clean about what they’re actually doing in this city.” The agreement outlining the data sharing was signed as a compromise on May 14, according to court records.

In addition to the 17,000 listings identified by the city, Airbnb will also share data on every listing rented through its platform between January 1, 2018, and February 18, 2019, that could have potentially violated New York’s short-term rental laws. The city prohibits rentals of an entire apartment or home for less than 30 days without the owner present in the unit, making many stays traditionally associated with services like Airbnb, HomeAway, and VRBO illegal. Only up to two guests are permitted in the short-term rental of an apartment or room, and they must be given “free and unobstructed access to every room and to each exit within the apartment,” meaning hosts can’t get around the ban on whole-apartment rentals by renting out three separate private rooms at once….(More)”.

What can we learn from billions of food purchases derived from fidelity cards?


Daniele Quercia at Medium: “By combining 1.6B food item purchases with 1.1B medical prescriptions for the entire city of London for one year, we discovered that, to predict health outcomes, socio-economic conditions matter less than what previous research has shown: despite being of lower-income, certain areas are healthy, and that is because of what their residents eat!

This result comes from our latest project “Poor but Healthy”, which was published in the Springer European Physical Journal (EPJ) of Data Science this month, and comes with a @tobi_vierzwo’s stunningly beautiful map of London I invite all of you to explore.

Why are we interested in urban health? In our cities, food is cheap and exercise discretionary, and health takes its toll. Half of European citizens will be obese by 2050, and obesity and its diseases are likely to reach crisis proportions. In this project, we set out to show that fidelity cards of grocery stores represent a treasure trove of health data — they can be used not only to (e)mail discount coupons to customers but also to effectively track a neighbourhood’s health in real-time for an entire city or even an entire country.

In research circles, the impact of eating habits on people’s health has mostly been studied using dietary surveys, which are costly and of limited scale.

To complement these surveys, we have recently resorted to grocery fidelity cards. We analyzed the anonymized records of 1.6B grocery items purchased by 1.6M grocery store customers in London over one whole year, and combined them with 1.1B medical prescriptions.

In so doing, we found that, as one expects, the “trick” to not being associated with chronic diseases is eating less what we instinctively like (e.g., sugar, carbohydrates), balancing all the nutrients, and avoiding the (big) quantities that are readily available. These results come as no surprise yet speak to the validity of using fidelity cards to capture health outcomes…(More)”.


Data to the rescue


Podcast by Kenneth Cukier: “Access to the right data can be as valuable in humanitarian crises as water or medical care, but it can also be dangerous. Misused or in the wrong hands, the same information can put already vulnerable people at further risk. Kenneth Cukier hosts this special edition of Babbage examining how humanitarian organisations use data and what they can learn from the profit-making tech industry. This episode was recorded live from Wilton Park, in collaboration with the United Nations OCHA Centre for Humanitarian Data…(More)”.

Data Collaboration for the Common Good: Enabling Trust and Innovation Through Public-Private Partnerships


World Economic Forum Report: “As the digital technologies of the Fourth Industrial Revolution continue to drive change throughout all sectors of the global economy, a unique moment exists to create a more inclusive, innovative and resilient society. Central to this change is the use of data. It is abundantly available but if improperly used will be the source of dangerous and unwelcome results.

When data is shared, linked and combined across sectoral and institutional boundaries, a multiplier effect occurs. Connecting one bit with another unlocks new insights and understandings that often weren’t anticipated. Yet, due to commercial limits and liabilities, the full value of data is often unrealized. This is particularly true when it comes to using data for the common good. While public-private data collaborations represent an unprecedented opportunity to address some of the world’s most urgent and complex challenges, they have generally been small and limited in impact. An entangled set of legal, technical, social, ethical and commercial risks have created an environment where the incentives for innovation have stalled. Additionally, the widening lack of trust among individuals and institutions creates even more uncertainty. After nearly a decade of anticipation on the promise of public-private data collaboration – with relatively few examples of success at global scale – a pivotal moment has arrived to encourage progress and move forward….(More)”

(See also http://datacollaboratives.org/).

Data Trusts, Health Data, and the Professionalization of Data Management


Paper by Keith Porcaro: “This paper explores how trusts can provide a legal model for professionalizing health data management. Data is potential. Over time, data collected for one purpose can support others. Clinical records at a hospital, created to manage a patient’s care, can be internally analyzed to identify opportunities for process and safety improvements at a hospital, or externally analyzed with other records to identify optimal treatment patterns. Data also carries the potential for harm. Personal data can be leaked or exposed. Proprietary models can be used to discriminate against patients, or price them out of care.

As novel uses of data proliferate, an individual data holder may be ill-equipped to manage complex new data relationships in a way that maximizes value and minimizes harm. A single organization may be limited by management capacity or risk tolerance. Organizations across sectors have digitized unevenly or late, and may not have mature data controls and policies. Collaborations that involve multiple organizations may face coordination problems, or disputes over ownership.

Data management is still a relatively young field. Most models of external data-sharing are based on literally transferring data—copying data between organizations, or pooling large datasets together under the control of a third party—rather than facilitating external queries of a closely held dataset.

Few models to date have focused on the professional management of data on behalf of a data holder, where the data holder retains control over not only their data, but the inferences derived from their data. Trusts can help facilitate the professionalization of data management. Inspired by the popularity of trusts for managing financial investments, this paper argues that data trusts are well-suited as a vehicle for open-ended professional management of data, where a manager’s discretion is constrained by fiduciary duties and a trust document that defines the data holder’s goals…(More)”.

We’ll soon know the exact air pollution from every power plant in the world. That’s huge.


David Roberts at Vox: “A nonprofit artificial intelligence firm called WattTime is going to use satellite imagery to precisely track the air pollution (including carbon emissions) coming out of every single power plant in the world, in real time. And it’s going to make the data public.

This is a very big deal. Poor monitoring and gaming of emissions data have made it difficult to enforce pollution restrictions on power plants. This system promises to effectively eliminate poor monitoring and gaming of emissions data….

The plan is to use data from satellites that make theirs publicly available (like the European Union’s Copernicus network and the US Landsat network), as well as data from a few private companies that charge for their data (like Digital Globe). The data will come from a variety of sensors operating at different wavelengths, including thermal infrared that can detect heat.

The images will be processed by various algorithms to detect signs of emissions. It has already been demonstrated that a great deal of pollution can be tracked simply through identifying visible smoke. WattTime says it can also use infrared imaging to identify heat from smokestack plumes or cooling-water discharge. Sensors that can directly track NO2 emissions are in development, according to WattTime executive director Gavin McCormick.

Between visible smoke, heat, and NO2, WattTime will be able to derive exact, real-time emissions information, including information on carbon emissions, for every power plant in the world. (McCormick says the data may also be used to derive information about water pollutants like nitrates or mercury.)

Google.org, Google’s philanthropic wing, is getting the project off the ground (pardon the pun) with a $1.7 million grant; it was selected through the Google AI Impact Challenge….(More)”.

The EU Wants to Build One of the World’s Largest Biometric Databases. What Could Possibly Go Wrong?


Grace Dobush at Fortune: “China and India have built the world’s largest biometric databases, but the European Union is about to join the club.

The Common Identity Repository (CIR) will consolidate biometric data on almost all visitors and migrants to the bloc, as well as some EU citizens—connecting existing criminal, asylum, and migration databases and integrating new ones. It has the potential to affect hundreds of millions of people.

The plan for the database, first proposed in 2016 and approved by the EU Parliament on April 16, was sold as a way to better track and monitor terrorists, criminals, and unauthorized immigrants.

The system will target the fingerprints and identity data for visitors and immigrants initially, and represents the first step towards building a truly EU-wide citizen database. At the same time, though, critics argue its mere existence will increase the potential for hacks, leaks, and law enforcement abuse of the information….

The European Parliament and the European Council have promised to address those concerns, through “proper safeguards” to protect personal privacy and to regulate officers’ access to data. In 2016, they passed a law regarding law enforcement’s access to personal data, alongside General Data Protection Regulation or GDPR.

But total security is a tall order. Germany is currently dealing with multipleinstances of police officers allegedly leaking personal information to far-right groups. Meanwhile, a Swedish hacker went to prison for hacking into Denmark’s public records system in 2012 and dumping online the personal data of hundreds of thousands of citizens and migrants….(More)”.


Facebook will open its data up to academics to see how it impacts elections


MIT Technology Review: “More than 60 researchers from 30 institutions will get access to Facebook user data to study its impact on elections and democracy, and how it’s used by advertisers and publishers.

A vast trove: Facebook will let academics see which websites its users linked to from January 2017 to February 2019. Notably, that means they won’t be able to look at the platform’s impact on the US presidential election in 2016, or on the Brexit referendum in the UK in the same year.

Despite this slightly glaring omission, it’s still hard to wrap your head around the scale of the data that will be shared, given that Facebook is used by 1.6 billion people every day. That’s more people than live in all of China, the most populous country on Earth. It will be one of the largest data sets on human behavior online to ever be released.

The process: Facebook didn’t pick the researchers. They were chosen by the Social Science Research Council, a US nonprofit. Facebook has been working on this project for over a year, as it tries to balance research interests against user privacy and confidentiality.

Privacy: In a blog post, Facebook said it will use a number of statistical techniques to make sure the data set can’t be used to identify individuals. Researchers will be able to access it only via a secure portal that uses a VPN and two-factor authentication, and there will be limits on the number of queries they can each run….(More)”.

Nowcasting the Local Economy: Using Yelp Data to Measure Economic Activity


Paper by Edward L. Glaeser, Hyunjin Kim and Michael Luca: “Can new data sources from online platforms help to measure local economic activity? Government datasets from agencies such as the U.S. Census Bureau provide the standard measures of local economic activity at the local level. However, these statistics typically appear only after multi-year lags, and the public-facing versions are aggregated to the county or ZIP code level. In contrast, crowdsourced data from online platforms such as Yelp are often contemporaneous and geographically finer than official government statistics. Glaeser, Kim, and Luca present evidence that Yelp data can complement government surveys by measuring economic activity in close to real time, at a granular level, and at almost any geographic scale. Changes in the number of businesses and restaurants reviewed on Yelp can predict changes in the number of overall establishments and restaurants in County Business Patterns. An algorithm using contemporaneous and lagged Yelp data can explain 29.2 percent of the residual variance after accounting for lagged CBP data, in a testing sample not used to generate the algorithm. The algorithm is more accurate for denser, wealthier, and more educated ZIP codes….(More)”.

See all papers presented at the NBER Conference on Big Data for 21st Century Economic Statistics here.

A weather tech startup wants to do forecasts based on cell phone signals


Douglas Heaven at MIT Technology Review: “On 14 April more snow fell on Chicago than it had in nearly 40 years. Weather services didn’t see it coming: they forecast one or two inches at worst. But when the late winter snowstorm came it caused widespread disruption, dumping enough snow that airlines had to cancel more than 700 flights across all of the city’s airports.

One airline did better than most, however. Instead of relying on the usual weather forecasts, it listened to ClimaCell – a Boston-based “weather tech” start-up that claims it can predict the weather more accurately than anyone else. According to the company, its correct forecast of the severity of the coming snowstorm allowed the airline to better manage its schedules and minimize losses due to delays and diversions. 

Founded in 2015, ClimaCell has spent the last few years developing the technology and business relationships that allow it to tap into millions of signals from cell phones and other wireless devices around the world. It uses the quality of these signals as a proxy for local weather conditions, such as precipitation and air quality. It also analyzes images from street cameras. It is offering a weather forecasting service to subscribers that it claims is 60 percent more accurate than that of existing providers, such as NOAA.

The internet of weather

The approach makes sense, in principle. Other forecasters use proxies, such as radar signals. But by using information from millions of everyday wireless devices, ClimaCell claims it has a far more fine-grained view of most of the globe than other forecasters get from the existing network of weather sensors, which range from ground-based devices to satellites. (ClimaCell also taps into these, too.)…(More)”.