Mixed Messages? The Limits of Automated Social Media Content Analysis


CDT Paper by Natasha Duarte, Emma Llanso and Anna Loup: “Governments and companies are turning to automated tools to make sense of what people post on social media, for everything ranging from hate speech detection to law enforcement investigations. Policymakers routinely call for social media companies to identify and take down hate speech, terrorist propaganda, harassment, “fake news” or disinformation, and other forms of problematic speech. Other policy proposals have focused on mining social media to inform law enforcement and immigration decisions. But these proposals wrongly assume that automated technology can accomplish on a large scale the kind of nuanced analysis that humans can accomplish on a small scale.

This paper explains the capabilities and limitations of tools for analyzing the text of social media posts and other online content. It is intended to help policymakers understand and evaluate available tools and the potential consequences of using them to carry out government policies. This paper focuses specifically on the use of natural language processing (NLP) tools for analyzing the text of social media posts. We explain five limitations of these tools that caution against relying on them to decide who gets to speak, who gets admitted into the country, and other critical determinations. This paper concludes with recommendations for policymakers and developers, including a set of questions to guide policymakers’ evaluation of available tools….(More)”.

Accelerating the Sharing of Data Across Sectors to Advance the Common Good


Paper by Robert M. Groves and Adam Neufeld: “The public pays for and provides an incredible amount of data to governments and companies. Yet much of the value of this data is being wasted, remaining in silos rather than being shared to enhance the common good—whether it’s helping governments to stop opioid addiction or helping companies predict and meet the demand for electric or autonomous vehicles.

  • Many companies and governments are interested in sharing more of their data with each other; however, right now the process of sharing is very time consuming and can pose great risks since it often involves sharing full data sets with another entity
  • We need intermediaries to design safe environments to facilitate data sharing in the low-trust and politically sensitive context of companies and governments. These safe environments would exist outside the government, be transparent to the public, and use modern technologies and techniques to allow only statistical uses of data through temporary linkages in order to minimize the risk to individuals’ privacy.
  • Governments must lead the way in sharing more data by re-evaluating laws that limit sharing of data, and must embrace new technologies that could allow the private sector to receive at least some value from many sensitive data sets. By decreasing the cost and risks of sharing data, more data will be freed from their silos, and we will move closer to what we deserve—that our data are used for the greatest societal benefit….(More)”.

Scientists can now figure out detailed, accurate neighborhood demographics using Google Street View photos


Christopher Ingraham at the Washington Post: “A team of computer scientists has derived accurate, neighborhood-level estimates of the racial, economic and political characteristics of 200 U.S. cities using an unlikely data source — Google Street View images of people’s cars.

Published this week in the Proceedings of the National Academy of Sciences, the report details how the scientists extracted 50 million photographs of street scenes captured by Google’s Street View cars in 2013 and 2014. They then trained a computer algorithm to identify the make, model and year of 22 million automobiles appearing in neighborhoods in those images, parked outside homes or driving down the street.

The vehicles seen in Street View images are often small or blurry, making precise identification a challenge. So the researchers had human experts identify a small subsample of the vehicles and compare those to the results churned out by their algorithm. They that the algorithm correctly identified whether a vehicle was U.S.- or foreign-made roughly 88 percent of the time, got the manufacturer right 66 percent of the time and nailed the exact model 52 percent of the time.

While far from perfect, the sheer size of the vehicle database means those numbers are still useful for real-world statistical applications, like drawing connections between vehicle preferences and demographic data. The 22 million vehicles in the database comprise roughly 8 percent of all vehicles in the United States. By comparison, the U.S. Census Bureau’s massive American Community Survey reaches only about 1.6 percent of American householdseach year, while the typical 1,000-person opinion poll includes just 0.0004 of American adults.

To test what this data set could be capable of, the researchers first paired the Zip code-level vehicle data with numbers on race, income and education from the American Community Survey. They did this for a random 15 percent of the Zip codes in their data set to create a “training set.” They then created another algorithm to go through the training set to see how vehicle characteristics correlated with neighborhood characteristics: What kinds of vehicles are disproportionately likely to appear in white neighborhoods, or black ones? Low-income vs. high-income? Highly-educated areas vs. less-educated ones?

That yielded a number of reliable correlations….(More)”.

Solving Public Problems with Data


Dinorah Cantú-Pedraza and Sam DeJohn at The GovLab: “….To serve the goal of more data-driven and evidence-based governing,  The GovLab at NYU Tandon School of Engineering this week launched “Solving Public Problems with Data,” a new online course developed with support from the Laura and John Arnold Foundation.

This online lecture series helps those working for the public sector, or simply in the public interest, learn to use data to improve decision-making. Through real-world examples and case studies — captured in 10 video lectures from leading experts in the field — the new course outlines the fundamental principles of data science and explores ways practitioners can develop a data analytical mindset. Lectures in the series include:

  1. Introduction to evidence-based decision-making  (Quentin Palfrey, formerly of MIT)
  2. Data analytical thinking and methods, Part I (Julia Lane, NYU)
  3. Machine learning (Gideon Mann, Bloomberg LP)
  4. Discovering and collecting data (Carter Hewgley, Johns Hopkins University)
  5. Platforms and where to store data (Arnaud Sahuguet, Cornell Tech)
  6. Data analytical thinking and methods, Part II (Daniel Goroff, Alfred P. Sloan Foundation)
  7. Barriers to building a data practice (Beth Blauer, Johns Hopkins University and GovEx)
  8. Data collaboratives (Stefaan G. Verhulst, The GovLab)
  9. Strengthening a data analytic culture (Amen Ra Mashariki, ESRI)
  10. Data governance and sharing (Beth Simone Noveck, NYU Tandon/The GovLab)

The goal of the lecture series is to enable participants to define and leverage the value of data to achieve improved outcomes and equities, reduced cost and increased efficiency in how public policies and services are created. No prior experience with computer science or statistics is necessary or assumed. In fact, the course is designed precisely to serve public professionals seeking an introduction to data science….(More)”.

SAM, the first A.I. politician on Messenger


 at Digital Trends: “It’s said that all politicians are the same, but it seems safe to assume that you’ve never seen a politician quite like this. Meet SAM, heralded as the politician of the future. Unfortunately, you can’t exactly shake this politician’s hand, or have her kiss your baby. Rather, SAM is the world’s first Virtual Politician (and a female presence at that), “driven by the desire to close the gap between what voters want and what politicians promise, and what they actually achieve.”

The artificially intelligent chat bot is currently live on Facebook Messenger, though she probably is most helpful to those in New Zealand. After all, the bot’s website notes, “SAM’s goal is to act as a representative for all New Zealanders, and evolves based on voter input.” Capable of being reached by anyone at just about anytime from anywhere, this may just be the single most accessible politician we’ve ever seen. But more importantly, SAM purports to be a true representative, claiming to analyze “everyone’s views [and] opinions, and impact of potential decisions.” This, the bot notes, could make for better policy for everyone….(More)”.

GovEx Launches First International Open Data Standards Directory


GT Magazine: “…A nonprofit gov tech group has created an international open data standards directory, aspiring to give cities a singular resource for guidance on formatting data they release to the public…The nature of municipal data is nuanced and diverse, and the format in which it is released often varies depending on subject matter. In other words, a format that works well for public safety data is not necessarily the same that works for info about building permits, transit or budgets. Not having a coordinated and agreed-upon resource to identify the best standards for these different types of info, Nicklin said, creates problems.

One such problem is that it can be time-consuming and challenging for city government data workers to research and identify ideal formats for data. Another is that the lack of info leads to discord between different jurisdictions, meaning one city might format a data set about economic development in an entirely different way than another, making collaboration and comparisons problematic.

What the directory does is provide a list of standards that are in use within municipal governments, as well as an evaluation based on how frequent that use is, whether the format is machine-readable, and whether users have to pay to license it, among other factors.

The directory currently contains 60 standards, some of which are in Spanish, and those involved with the project say they hope to expand their efforts to include more languages. There is also a crowdsourcing component to the directory, in that users are encouraged to make additions and updates….(More)”

How Muckrakers Use Crowdsourcing: Case Studies from ProPublica to The Guardian


Toby McIntosh at Global Investigative Journalism:”…Creative use of social media provides new ways for journalists not just to solicit tips, but also to tap readers’ expertise, opinions and personal experiences.

A stronger ethos of reader engagement is resulting in more sophisticated appeals from journalists for assistance with investigations, including:

  • Seeking tips on very defined topics
  • Asking readers to talk about their experiences on broad subjects
  • Inviting comments after publication

Here are examples of what your colleagues are doing:

Hey, Shell Employees!

Dutch reporter Jelmer Mommers of Dutch news site De Correspondent appealed directly to Shell employees for information in a lengthy blog post, as described in this article. The resulting investigation revealed that Shell had detailed knowledge of the dangers of climate change more than a quarter century ago.

Along the way, in what Jelmer calls “the most romantic moment,” came the surprise delivery of a box full of internal documents. De Correspondent’s emphasis on communicating with subscribers is described here.

Call for Childbirth Experiences

Getting reader input in advance was key to a major U.S. story on maternal health to which thousands of people contributed. ProPublica  engagement reporter Adriana Gallardo and her colleagues published a questionnaire in February of 2017 aimed at women who had experienced life-threatening complications in childbirth.

Using a variety of social media channels, Gallardo, along with ProPublica’s Nina Martin and NPR’s Renee Montagne, received several thousand responses. The personal stories fueled a series and the connections made are still being maintained for follow-up work. Read more in this this GIJN article.

Testimonials from Mexico’s Drug War

Anyone’s Child Mexico” is a documentary about the families affected by Mexico’s drug war. To gather stories, the producers of the documentary publicized a free phone line through local partners and asked people across Mexico to call in and recount their stories.

Callers could also listen to other testimonials. With funding from the University of Bristol’s Brigstow Institute, producers Matthew Brown, Ewan Cass-Kavanagh, Mary Ryder and Jane Slater created a website to bring together audio, photos, video and text and tell harrowing stories of a country ravaged by violence….(More)”.

The Hidden Pitfall of Innovation Prizes


Reto Hofstetter, John Zhang and Andreas Herrmann at Harvard Business Review: “…it is not so easy to get people to submit their ideas to online innovation platforms. Our data from an online panel reveal that 65% of the contributors do not come back more than twice, and that most of the rest quit after a few tries. This kind of user churn is endemic to online social platforms — on Twitter, for example, a majority of users become inactive over time — and crowdsourcing is no exception. In a way, this turnover is even worse than ordinary customer churn: When a customer defects, a firm knows the value of what it’s lost, but there is no telling how valuable the ideas not submitted might have been….

It is surprising, then, that crowdsourcing on popular platforms is typically designed in a way that amplifies churn. Right now, in typical innovation contests, rewards are granted to winners only and the rest get no return on their participation. This design choice is often motivated by the greater effort participants exert when there is a top prize much more valuable than the rest. Often, the structure is something like the Wimbledon Tennis Championship, where the winning player wins twice as much as the runner up and four times as much as the semifinalists — with the rest eventually leaving empty handed.

This winner-take-most prize spread increases the incentive to win and thus individual efforts. With only one winner, however, the others are left with nothing to show for their effort, which may significantly reduce their motivation to enter again.

An experiment we recently ran confirmed that the way entrants respond to this kind of winner-take-all prize structure. …

In line with the above reasoning, we found that winner-take-all contests yielded significantly better ideas compared to multiple prizes in the first round. Importantly, however, this result flipped when we invited the same cohort of innovators to participate again in the second subsequent contest. While 50% of the multiple-prize contest chose to participate again, only 37% did so when the winner-took-all in their first contest. Moreover, innovators who had received no reward in the first contest showed significantly lower effort in the second contest and generated fewer ideas. In the second contest, multiple prizes generated better ideas than the second round of the winner-take-all contest….

Other non-monetary positive feedback, such as encouraging comments or ratings, can have similar effects. These techniques are important, because alleviating innovator churn helps companies interested in longer-term success of their crowdsourcing activities….(More)”.

Participatory budgeting: adoption and transformation


Paper by Michael Touchton and Brian Wampler: “Participatory budgeting programmes are spreading rapidly across the world because they offer government officials and citizens the opportunity to engage each other in new ways as they combine democratic practices with the ‘nitty gritty’ of policy-making. The principles and ideas associated with participatory budgeting appeal to a broad spectrum of citizens, civil society activists, government officials and international agencies, which helps explain why it is so popular and has expanded so quickly.

In this research briefing, we focus on adoption and transformation of participatory budgeting in several low- and middle-income countries where international donors are active. We are particularly interested in better understanding how participatory budgeting is transforming in countries where international donors are active, where states struggle to provide public services, and where urban and rural communities are characterised by high levels of poverty… (More)”.

Sharing is Daring: An Experiment on Consent, Chilling Effects and a Salient Privacy Nudge


Hermstrüwer, Yoan and Dickert, Stephan at the International Review of Law and Economics: “Privacy law rests on the assumption that government surveillance may increase the general level of conformity and thus generate a chilling effect. In a study that combines elements of a lab and a field experiment, we show that salient and incentivized consent options are sufficient to trigger this behavioral effect. Salient ex ante consent options may lure people into giving up their privacy and increase their compliance with social norms – even when the only immediate risk of sharing information is mere publicity on a Google website. A right to be forgotten (right to deletion), however, seems to reduce neither privacy valuations nor chilling effects. In spite of low deletion costs people tend to stick with a retention default. The study suggests that consent architectures may play out on social conformity rather than on consent choices and privacy valuations. Salient notice and consent options may not merely empower users to make an informed consent decision. Instead, they can trigger the very effects that privacy law intends to curb….(More)”.