Robot census: Gathering data to improve policymaking on new technologies


Essay by Robert Seamans: There is understandable excitement about the impact that new technologies like artificial intelligence (AI) and robotics will have on our economy. In our everyday lives, we already see the benefits of these technologies: when we use our smartphones to navigate from one location to another using the fastest available route or when a predictive typing algorithm helps us finish a sentence in our email. At the same time, there are concerns about possible negative effects of these new technologies on labor. The Council of Economic Advisers of the past two Administrations have addressed these issues in the annual Economic Report of the President (ERP). For example, the 2016 ERP included a chapter on technology and innovation that linked robotics to productivity and growth, and the 2019 ERP included a chapter on artificial intelligence that discussed the uneven effects of technological change. Both these chapters used data at highly aggregated levels, in part because that is the data that is available. As I’ve noted elsewhere, AI and robots are everywhere, except, as it turns out, in the data.

To date, there have been no large scale, systematic studies in the U.S. on how robots and AI affect productivity and labor in individual firms or establishments (a firm could own one or more establishments, which for example could be a plant in a manufacturing setting or a storefront in a retail setting). This is because the data are scarce. Academic researchers interested in the effects of AI and robotics on economic outcomes have mostly used aggregate country and industry-level data. Very recently, some have studied these issues at the firm level using data on robot imports to France, Spain, and other countries. I review a few of these academic papers in both categories below, which provide early findings on the nuanced role these new technologies have on labor. Thanks to some excellent work being done by the U.S. Census Bureau, however, we may soon have more data to work with. This includes new questions on robot purchases in the Annual Survey of Manufacturers and Annual Capital Expenditures Survey and new questions on other technologies including cloud computing and machine learning in the Annual Business Survey….(More)”.

Governance of Data Sharing: a Law & Economics Proposal


Paper by Jens Prufer and Inge Graef: “To prevent market tipping, which inhibits innovation, there is an urgent need to mandate sharing of user information in data-driven markets. Existing legal mechanisms to impose data sharing under EU competition law and data portability under the GDPR are not sufficient to tackle this problem. Mandated data sharing requires the design of a governance structure that combines elements of economically efficient centralization with legally necessary decentralization. We identify three feasible options. One is to centralize investigations and enforcement in a European Data Sharing Agency (EDSA), while decision-making power lies with National Competition Authorities in a Board of Supervisors. The second option is to set up a Data Sharing Cooperation Network coordinated through a European Data Sharing Board, with the National Competition Authority best placed to run the investigation adjudicating and enforcing the mandatory data-sharing decision across the EU. A third option is to mix both governance structures and to task national authorities to investigate and adjudicate and the EU-level EDSA with enforcement of data sharing….(More)”

Democratizing data in a 5G world


Blog by Dimitrios Dosis at Mastercard: “The next generation of mobile technology has arrived, and it’s more powerful than anything we’ve experienced before. 5G can move data faster, with little delay — in fact, with 5G, you could’ve downloaded a movie in the time you’ve read this far. 5G will also create a vast network of connected machines. The Internet of Things will finally deliver on its promise to fuse all our smart products — vehicles, appliances, personal devices — into a single streamlined ecosystem.

My smartwatch could monitor my blood pressure and schedule a doctor’s appointment, while my car could collect data on how I drive and how much gas I use while behind the wheel. In some cities, petrol trucks already act as roving gas stations, receiving pings when cars are low on gas and refueling them as needed, wherever they are.

This amounts to an incredible proliferation of data. By 2025, every connected person will conduct nearly 5,000 data interactions every day — one every 18 seconds — whether they know it or not. 

Enticing and convenient as new 5G-powered developments may be, it also raises complex questions about data. Namely, who is privy to our personal information? As your smart refrigerator records the foods you buy, will the refrigerator’s manufacturer be able to see your eating habits? Could it sell that information to a consumer food product company for market research without your knowledge? And where would the information go from there? 

People are already asking critical questions about data privacy. In fact, 72% of them say they are paying attention to how companies collect and use their data, according to a global survey released last year by the Harvard Business Review Analytic Services. The survey, sponsored by Mastercard, also found that while 60% of executives believed consumers think the value they get in exchange for sharing their data is worthwhile, only 44% of consumers actually felt that way.

There are many reasons for this data disconnect, including the lack of transparency that currently exists in data sharing and the tension between an individual’s need for privacy and his or her desire for personalization.

This paradox can be solved by putting data in the hands of the people who create it — giving consumers the ability to manage, control and share their own personal information when they want to, with whom they want to, and in a way that benefits them.

That’s the basis of Mastercard’s core set of principles regarding data responsibility – and in this 5G world, it’s more important than ever. We will be able to gain from these new technologies, but this change must come with trust and user control at its core. The data ecosystem needs to evolve from schemes dominated by third parties, where some data brokers collect inferred, often unreliable and inaccurate data, then share it without the consumer’s knowledge….(More)”.

Give more data, awareness and control to individual citizens, and they will help COVID-19 containment


Paper by Mirco Nanni et al: “The rapid dynamics of COVID-19 calls for quick and effective tracking of virus transmission chains and early detection of outbreaks, especially in the “phase 2” of the pandemic, when lockdown and other restriction measures are progressively withdrawn, in order to avoid or minimize contagion resurgence. For this purpose, contact-tracing apps are being proposed for large scale adoption by many countries. A centralized approach, where data sensed by the app are all sent to a nation-wide server, raises concerns about citizens’ privacy and needlessly strong digital surveillance, thus alerting us to the need to minimize personal data collection and avoiding location tracking. We advocate the conceptual advantage of a decentralized approach, where both contact and location data are collected exclusively in individual citizens’ “personal data stores”, to be shared separately and selectively (e.g., with a backend system, but possibly also with other citizens), voluntarily, only when the citizen has tested positive for COVID-19, and with a privacy preserving level of granularity. This approach better protects the personal sphere of citizens and affords multiple benefits: it allows for detailed information gathering for infected people in a privacy-preserving fashion; and, in turn this enables both contact tracing, and, the early detection of outbreak hotspots on more finely-granulated geographic scale. The decentralized approach is also scalable to large populations, in that only the data of positive patients need be handled at a central level. Our recommendation is two-fold. First to extend existing decentralized architectures with a light touch, in order to manage the collection of location data locally on the device, and allow the user to share spatio-temporal aggregates—if and when they want and for specific aims—with health authorities, for instance. Second, we favour a longer-term pursuit of realizing a Personal Data Store vision, giving users the opportunity to contribute to collective good in the measure they want, enhancing self-awareness, and cultivating collective efforts for rebuilding society….(More)”.

Facebook Data for Good


Foreword by Sheryl Sandberg: “When Facebook launched the Data for Good program in 2017, we never imagined it would play a role so soon in response to a truly global emergency. The COVID-19 pandemic is not just a public health crisis, but also a social and economic one. It has caused hardship in every part of the world, but its impact hasn’t been felt equally. It has hit women and the most disadvantaged communities the hardest – something this work has helped shine a light on.

In response to the pandemic, Facebook has been part of an unprecedented collaboration between technology companies, the public sector, universities, nonprofits and others. Our partners operate in some of the most challenging environments in the world, where lengthy analysis and debate is often a luxury they don’t have. The policies that govern delivery of vaccines, masks, and financial support can mean the difference between life and death. By sharing tools that provide real-time insights, Facebook can make decision-making on the ground just a little bit easier and more effective.

This report highlights some of the ways Facebook data – shared in a way that protects the privacy of individuals – assisted the response efforts to the pandemic and other major crises in 2020. I hope the examples included help illustrate what successful data sharing projects can look like, and how future projects can be improved. Above all, I hope we can continue to work together in 2021 and beyond to save lives and mitigate the damage caused by the pandemic and any crises that may follow….(More)”.

Enabling the future of academic research with the Twitter API


Twitter Developer Blog: “When we introduced the next generation of the Twitter API in July 2020, we also shared our plans to invest in the success of the academic research community with tailored solutions that better serve their goals. Today, we’re excited to launch the Academic Research product track on the new Twitter API. 

Why we’re launching this & how we got here

Since the Twitter API was first introduced in 2006, academic researchers have used data from the public conversation to study topics as diverse as the conversation on Twitter itself – from state-backed efforts to disrupt the public conversation to floods and climate change, from attitudes and perceptions about COVID-19 to efforts to promote healthy conversation online. Today, academic researchers are one of the largest groups of people using the Twitter API. 

Our developer platform hasn’t always made it easy for researchers to access the data they need, and many have had to rely on their own resourcefulness to find the right information. Despite this, for over a decade, academic researchers have used Twitter data for discoveries and innovations that help make the world a better place.

Over the past couple of years, we’ve taken iterative steps to improve the experience for researchers, like when we launched a webpage dedicated to Academic Research, and updated our Twitter Developer Policy to make it easier to validate or reproduce others’ research using Twitter data.

We’ve also made improvements to help academic researchers use Twitter data to advance their disciplines, answer urgent questions during crises, and even help us improve Twitter. For example, in April 2020, we released the COVID-19 stream endpoint – the first free, topic-based stream built solely for researchers to use data from the global conversation for the public good. Researchers from around the world continue to use this endpoint for a number of projects.

Over two years ago, we started our own extensive research to better understand the needs, constraints and challenges that researchers have when studying the public conversation. In October 2020, we tested this product track in a private beta program where we gathered additional feedback. This gave us a glimpse into some of the important work that the free Academic Research product track we’re launching today can now enable….(More)”.

Facebook will let researchers study how advertisers targeted users with political ads prior to Election Day


Nick Statt at The Verge: “Facebook is aiming to improve transparency around political advertising on its platform by opening up more data to independent researchers, including targeting information on more than 1.3 million ads that ran in the three months prior to the US election on November 3rd of last year. Researchers interested in studying the ads can apply for access to the Facebook Open Research and Transparency (FORT) platform here.

The move is significant because Facebook has long resisted willfully allowing access to data around political advertising, often citing user privacy. The company has gone so far as to even disable third-party web plugins, like ProPublica’s Facebook Political Ad Collector tool, that collect such data without Facebook’s express consent.

Numerous research groups around the globe have spent years now studying Facebook’s impact on everything from democratic elections to news dissemination, but sometimes without full access to all the desired data. Only last year, after partnering with Harvard University’s Social Science One (the group overseeing applications for the new political ad targeting initiative), did Facebook better formalize the process of granting anonymized user data for research studies.

In the past, Facebook has made some crucial political ad information in its Ad Library available to the public, including the amount spent on certain ads and demographic information about who saw those ads. But now the company says it wants to do more to improve transparency, specifically around how advertisers target certain subsets of users with political advertising….(More)”.

Sustainable Rescue: data sharing to combat human trafficking


Interview with Paul Fockens of  Sustainable Rescue: “Human trafficking still takes place on a large scale, and still too often under the radar. That does not make it easy for organisations that want to combat human trafficking. Sharing of data between various sorts of organisations, including the government, the police, but also banks play a crucial role in mapping the networks of criminals involved in human trafficking, including their victims. Data sharing contributes to tackling this criminal business not only reactively, but also proactively….Sustainable Rescue tries to make the largely invisible human trafficking visible. Bundling data and therefore knowledge is crucial in this. Paul: “It’s about combining the routes criminals (and their victims) take from A to B, the financial transactions they make, the websites they visit, the hotels where they check in et cetera. All those signs of human trafficking can be found in the data of various types of organisations: the police, municipalities, the Public Prosecution Service, charities such as the Salvation Army, but also banks and insurance institutions. The problem here is that you need to collect all pieces of the puzzle to get clear insights from them. As long as this relevant data is not combined through data sharing, it is a very difficult job to get these insights. In nine out of ten cases, these authorities are not willing and/or allowed to share their data, mainly because of the privacy sensitivity of this data. However, in order to eliminate human trafficking, that data will have to be bundled. Only then analyses can be made about the patterns of a network of human trafficking.”…(More)”.

Improved targeting for mobile phone surveys: A public-private data collaboration


Blogpost by Kristen Himelein and Lorna McPherson: “Mobile phone surveys have been rapidly deployed by the World Bank to measure the impact of COVID-19 in nearly 100 countries across the world. Previous posts on this blog have discussed the sampling and  implementation challenges associated with these efforts, and coverage errors are an inherent problem to the approach. The survey methodology literature has shown mobile phone survey respondents in the poorest countries are more likely to be male, urban, wealthier, and more highly educated. This bias can stem from phone ownership, as mobile phone surveys are at best representative of mobile phone owners, a group which, particularly in poor countries, may differ from the overall population; or from differential response rates among these owners, with some groups more or less likely to respond to a call from an unknown number. In this post, we share our experiences in trying to improve representativeness and boost sample sizes for the poor in Papua New Guinea (PNG)….(More)”.

Nowcasting Gentrification Using Airbnb Data


Paper by Shomik Jain, Davide Proserpio, Giovanni Quattrone, and Daniele Quercia: “There is a rumbling debate over the impact of gentrification: presumed gentrifiers have been the target of protests and attacks in some cities, while they have been welcome as generators of new jobs and taxes in others. Census data fails to measure neighborhood change in real-time since it is usually updated every ten years. This work shows that Airbnb data can be used to quantify and track neighborhood changes. Specifically, we consider both structured data (e.g. number of listings, number of reviews, listing information) and unstructured data (e.g. user-generated reviews processed with natural language processing and machine learning algorithms) for three major cities, New York City (US), Los Angeles (US), and Greater London (UK). We find that Airbnb data (especially its unstructured part) appears to nowcast neighborhood gentrification, measured as changes in housing affordability and demographics. Overall, our results suggest that user-generated data from online platforms can be used to create socioeconomic indices to complement traditional measures that are less granular, not in real-time, and more costly to obtain….(More)”.