As Surveys Falter Big Data Polling Narrows Our Societal Understanding


Kalev Leetaru at Forbes: “One of the most talked-about stories in the world of polling and survey research in recent years has been the gradual death of survey response rates and the reliability of those insights….

The online world’s perceived anonymity has offered some degree of reprieve in which online polls and surveys have often bested traditional approaches in assessing views towards society’s most controversial issues. Yet, here as well increasing public understanding of phishing and online safety are ever more problematic.

The answer has been the rise of “big data” analysis of society’s digital exhaust to fill in the gaps….

Is it truly the same answer though?

Constructing and conducting a well-designed survey means being able to ask the public exactly the questions of interest. Most importantly, it entails being able to ensure representative demographics of respondents.

An online-only poll is unlikely to accurately capture the perspectives of the three quarters of the earth’s population that the digital revolution has left behind. Even within the US, social media platforms are extraordinarily skewed.

The far greater problem is that society’s data exhaust is rarely a perfect match for the questions of greatest interest to policymakers and public.

Cellphone mobility records can offer an exquisitely detailed look at how the people of a city go about their daily lives, but beneath all that blinding light are the invisible members of society not deemed valuable to advertisers and thus not counted. Even for the urban society members whose phones are their ever-present companions, mobility data only goes so far. It can tell us that occupants of a particular part of the city during the workday spend their evenings in a particular part of the city, allowing us to understand their work/life balance, but it offers few insights into their political leanings.

One of the greatest challenges of today’s “big data” surveying is that it requires us to narrow our gaze to only those questions which can be easily answered from the data at hand.

Much as AI’s crisis of bias comes from the field’s steadfast refusal to pay for quality data, settling for highly biased free data, so too has “big data” surveying limited itself largely to datasets it can freely and easily acquire.

The result is that with traditional survey research, we are free to ask the precise questions we are most interested in. With data exhaust research, we must imperfectly shoehorn our questions into the few available metrics. With sufficient creativity it is typically possible to find some way of proxying the given question, but the resulting proxies may be highly unstable, with little understanding of when and where they may fail.

Much like how the early rise of the cluster computing era caused “big data” researchers to limit the questions they asked of their data to just those they could fit into a set of tiny machines, so too has the era of data exhaust surveying forced us to greatly restrict our understanding of society.

Most dangerously, however, big data surveying implicitly means we are measuring only the portion of society our vast commercial surveillance state cares about.

In short, we are only able to measure those deemed of greatest interest to advertisers and thus the most monetizable.

Putting this all together, the decline of traditional survey research has led to the rise of “big data” analysis of society’s data exhaust. Instead of giving us an unprecedented new view into the heartbeat of daily life, this reliance on the unintended output of our digital lives has forced researchers to greatly narrow the questions they can explore and severely skews them to the most “monetizable” portions of society.

In the end, the shift of societal understanding from precision surveys to the big data revolution has led not to an incredible new understanding of what makes us tick, but rather a far smaller, less precise and less accurate view than ever before, just our need to understand ourselves has never been greater….(More)”.

Facebook will open its data up to academics to see how it impacts elections


MIT Technology Review: “More than 60 researchers from 30 institutions will get access to Facebook user data to study its impact on elections and democracy, and how it’s used by advertisers and publishers.

A vast trove: Facebook will let academics see which websites its users linked to from January 2017 to February 2019. Notably, that means they won’t be able to look at the platform’s impact on the US presidential election in 2016, or on the Brexit referendum in the UK in the same year.

Despite this slightly glaring omission, it’s still hard to wrap your head around the scale of the data that will be shared, given that Facebook is used by 1.6 billion people every day. That’s more people than live in all of China, the most populous country on Earth. It will be one of the largest data sets on human behavior online to ever be released.

The process: Facebook didn’t pick the researchers. They were chosen by the Social Science Research Council, a US nonprofit. Facebook has been working on this project for over a year, as it tries to balance research interests against user privacy and confidentiality.

Privacy: In a blog post, Facebook said it will use a number of statistical techniques to make sure the data set can’t be used to identify individuals. Researchers will be able to access it only via a secure portal that uses a VPN and two-factor authentication, and there will be limits on the number of queries they can each run….(More)”.

Nowcasting the Local Economy: Using Yelp Data to Measure Economic Activity


Paper by Edward L. Glaeser, Hyunjin Kim and Michael Luca: “Can new data sources from online platforms help to measure local economic activity? Government datasets from agencies such as the U.S. Census Bureau provide the standard measures of local economic activity at the local level. However, these statistics typically appear only after multi-year lags, and the public-facing versions are aggregated to the county or ZIP code level. In contrast, crowdsourced data from online platforms such as Yelp are often contemporaneous and geographically finer than official government statistics. Glaeser, Kim, and Luca present evidence that Yelp data can complement government surveys by measuring economic activity in close to real time, at a granular level, and at almost any geographic scale. Changes in the number of businesses and restaurants reviewed on Yelp can predict changes in the number of overall establishments and restaurants in County Business Patterns. An algorithm using contemporaneous and lagged Yelp data can explain 29.2 percent of the residual variance after accounting for lagged CBP data, in a testing sample not used to generate the algorithm. The algorithm is more accurate for denser, wealthier, and more educated ZIP codes….(More)”.

See all papers presented at the NBER Conference on Big Data for 21st Century Economic Statistics here.

Data Pools: Wi-Fi Geolocation Spoofing


AH Projects: “DataPools is a Wi-Fi geolocation spoofing project that virtually relocates your phone to the latitudes and longitudes of Silicon Valley success. It includes a catalog and a SkyLift device with 12 pre-programmed locations. DataPools was produced for the Tropez summer art event in Berlin and in collaboration with Anastasia Kubrak.

DataPools catalog pool index

DataPools catalog pool index

Weren’t invited to Jeff Bezos’s summer pool party? No problem. DataPools uses the SkyLift device to mimick the Wi-Fi network infrastructure at 12 of the top Silicon Valley CEOs causing your phone to show up, approximately, at their pool. Because Wi-Fi spoofing affects the core geolocation services of iOS and Android smartphones, all apps on phone and the metadata they generate, will be located in the spoofed location…

Data Pools is a metaphor for a store of wealth that is private. The luxurious pools and mansions of Silicon Valley are financed by the mechanisms of economic surveillance and ownership of our personal information. Yet, the geographic locations of these premises are often concealed, hidden, and removed from open source databases. What if we could reverse this logic and plunge into the pools of ludicrous wealth, both virtually and physically? Could we apply the same methods of data extraction to highlight the ridiculous inequalities between CEOs and platform users?

Comparison of wealth distribution among top Silicon Valley CEOs

Comparison of wealth distribution among top Silicon Valley CEOs

Data

Technically, DataPools uses a Wi-Fi microcontroller programmed with the BSSIDs and SSIDs from the target locations, which were all obtained using openly published information from web searches and wigle.net. This data is then programmed onto the firmware of the SkyLift device. One SkyLift device contains all 12 pool locations. However, throughout the installation improvements were made and the updated firmware now uses one main location with multiple sub-locations to cover a larger area during installations. This method was more effective at spoofing many phones in large area and is ideal for installations….(More)”.

The Blockchain Game: A great new tool for your classroom


IBM Blockchain Blog: “Blockchain technology can be a game-changer for accounting, supply chainbanking, contract law, and many other fields. But it will only be useful if lots and lots of non-technical managers and leaders trust and adopt it. And right now, just understanding what blockchain is, can be difficult to understand even for the brightest in these fields. Enter The Blockchain Game, a hands-on exercise that explains blockchain’s core principals, and serves as a launching pad for discussion of blockchain’s real-world applications.

In The Blockchain Game students act as nodes and miners on a blockchain network for storing student grades at a university. Participants record the grade and course information, and then “build the block” by calculating a unique identifier (a hash) to secure the grade ledger, and miners get rewarded for their work. As the game is played, the audience learns about hashes, private keys, and what uses are appropriate for a blockchain ledger.

Basics of the Game

  • A hands-on simulation centering around a blockchain for academic scores, including a discussion at the end of the simulation regarding if storing grades would be a good application for blockchain.
  • No computers. Participants are the computors and calculate blocks.
  • The game seeks to teach core concepts about a distributed ledger but can be modified to whichever use case the educator wishes to use — smart contracts, supply chain, applications and others.
  • Additional elements can be added if instructors want to facilitate the game on a computer….(More)”.

A weather tech startup wants to do forecasts based on cell phone signals


Douglas Heaven at MIT Technology Review: “On 14 April more snow fell on Chicago than it had in nearly 40 years. Weather services didn’t see it coming: they forecast one or two inches at worst. But when the late winter snowstorm came it caused widespread disruption, dumping enough snow that airlines had to cancel more than 700 flights across all of the city’s airports.

One airline did better than most, however. Instead of relying on the usual weather forecasts, it listened to ClimaCell – a Boston-based “weather tech” start-up that claims it can predict the weather more accurately than anyone else. According to the company, its correct forecast of the severity of the coming snowstorm allowed the airline to better manage its schedules and minimize losses due to delays and diversions. 

Founded in 2015, ClimaCell has spent the last few years developing the technology and business relationships that allow it to tap into millions of signals from cell phones and other wireless devices around the world. It uses the quality of these signals as a proxy for local weather conditions, such as precipitation and air quality. It also analyzes images from street cameras. It is offering a weather forecasting service to subscribers that it claims is 60 percent more accurate than that of existing providers, such as NOAA.

The internet of weather

The approach makes sense, in principle. Other forecasters use proxies, such as radar signals. But by using information from millions of everyday wireless devices, ClimaCell claims it has a far more fine-grained view of most of the globe than other forecasters get from the existing network of weather sensors, which range from ground-based devices to satellites. (ClimaCell also taps into these, too.)…(More)”.

How Technology Could Revolutionize Refugee Resettlement


Krishnadev Calamur in The Atlantic: “… For nearly 70 years, the process of interviewing, allocating, and accepting refugees has gone largely unchanged. In 1951, 145 countries came together in Geneva, Switzerland, to sign the Refugee Convention, the pact that defines who is a refugee, what refugees’ rights are, and what legal obligations states have to protect them.

This process was born of the idealism of the postwar years—an attempt to make certain that those fleeing war or persecution could find safety so that horrific moments in history, such as the Holocaust, didn’t recur. The pact may have been far from perfect, but in successive years, it was a lifeline to Afghans, Bosnians, Kurds, and others displaced by conflict.

The world is a much different place now, though. The rise of populism has brought with it a concomitant hostility toward immigrants in general and refugees in particular. Last October, a gunman who had previously posted anti-Semitic messages online against HIAS killed 11 worshippers in a Pittsburgh synagogue. Many of the policy arguments over resettlement have shifted focus from humanitarian relief to security threats and cost. The Trump administration has drastically cut the number of refugees the United States accepts, and large parts of Europe are following suit.

If it works, Annie could change that dynamic. Developed at Worcester Polytechnic Institute in Massachusetts, Lund University in Sweden, and the University of Oxford in Britain, the software uses what’s known as a matching algorithm to allocate refugees with no ties to the United States to their new homes. (Refugees with ties to the United States are resettled in places where they have family or community support; software isn’t involved in the process.)

Annie’s algorithm is based on a machine learning model in which a computer is fed huge piles of data from past placements, so that the program can refine its future recommendations. The system examines a series of variables—physical ailments, age, levels of education and languages spoken, for example—related to each refugee case. In other words, the software uses previous outcomes and current constraints to recommend where a refugee is most likely to succeed. Every city where HIAS has an office or an affiliate is given a score for each refugee. The higher the score, the better the match.

This is a drastic departure from how refugees are typically resettled. Each week, HIAS and the eight other agencies that allocate refugees in the United States make their decisions based largely on local capacity, with limited emphasis on individual characteristics or needs….(More)”.

How to Argue with an Algorithm: Lessons from the COMPAS ProPublica Debate


Paper by Anne L. Washington: “The United States optimizes the efficiency of its growing criminal justice system with algorithms however, legal scholars have overlooked how to frame courtroom debates about algorithmic predictions. In State v Loomis, the defense argued that the court’s consideration of risk assessments during sentencing was a violation of due process because the accuracy of the algorithmic prediction could not be verified. The Wisconsin Supreme Court upheld the consideration of predictive risk at sentencing because the assessment was disclosed and the defendant could challenge the prediction by verifying the accuracy of data fed into the algorithm.

Was the court correct about how to argue with an algorithm?

The Loomis court ignored the computational procedures that processed the data within the algorithm. How algorithms calculate data is equally as important as the quality of the data calculated. The arguments in Loomis revealed a need for new forms of reasoning to justify the logic of evidence-based tools. A “data science reasoning” could provide ways to dispute the integrity of predictive algorithms with arguments grounded in how the technology works.

This article’s contribution is a series of arguments that could support due process claims concerning predictive algorithms, specifically the Correctional Offender Management Profiling for Alternative Sanctions (“COMPAS”) risk assessment. As a comprehensive treatment, this article outlines the due process arguments in Loomis, analyzes arguments in an ongoing academic debate about COMPAS, and proposes alternative arguments based on the algorithm’s organizational context….(More)”

Report on Algorithmic Risk Assessment Tools in the U.S. Criminal Justice System


Press release: “The Partnership on AI (PAI) has today published a report gathering the views of the multidisciplinary artificial intelligence and machine learning research and ethics community which documents the serious shortcomings of algorithmic risk assessment tools in the U.S. criminal justice system. These kinds of AI tools for deciding on whether to detain or release defendants are in widespread use around the United States, and some legislatures have begun to mandate their use. Lessons drawn from the U.S. context have widespread applicability in other jurisdictions, too, as the international policymaking community considers the deployment of similar tools.

While criminal justice risk assessment tools are often simpler than the deep neural networks used in many modern artificial intelligence systems, they are basic forms of AI. As such, they present a paradigmatic example of the high-stakes social and ethical consequences of automated AI decision-making….

Across the report, challenges to using these tools fell broadly into three primary categories:

  1. Concerns about the accuracy, bias, and validity in the tools themselves
    • Although the use of these tools is in part motivated by the desire to mitigate existing human fallibility in the criminal justice system, this report suggests that it is a serious misunderstanding to view tools as objective or neutral simply because they are based on data.
  2. Issues with the interface between the tools and the humans who interact with them
    • In addition to technical concerns, these tools must be held to high standards of interpretability and explainability to ensure that users (including judges, lawyers, and clerks, among others) can understand how the tools’ predictions are reached and make reasonable decisions based on these predictions.
  3. Questions of governance, transparency, and accountability
    • To the extent that such systems are adapted to make life-changing decisions, tools and decision-makers who specify, mandate, and deploy them must meet high standards of transparency and accountability.

This report highlights some of the key challenges with the use of risk assessment tools for criminal justice applications. It also raises some deep philosophical and procedural issues which may not be easy to resolve. Surfacing and addressing those concerns will require ongoing research and collaboration between policymakers, the AI research community, civil society groups, and affected communities, as well as new types of data collection and transparency. It is PAI’s mission to spur and facilitate these conversations and to produce research to bridge such gaps….(More)”

AI & Global Governance: Robots Will Not Only Wage Future Wars but also Future Peace


Daanish Masood & Martin Waehlisch at the United Nations University: “At the United Nations, we have been exploring completely different scenarios for AI: its potential to be used for the noble purposes of peace and security. This could revolutionize the way of how we prevent and solve conflicts globally.

Two of the most promising areas are Machine Learning and Natural Language Processing. Machine Learning involves computer algorithms detecting patterns from data to learn how to make predictions and recommendations. Natural Language Processing involves computers learning to understand human languages.

At the UN Secretariat, our chief concern is with how these emerging technologies can be deployed for the good of humanity to de-escalate violence and increase international stability.

This endeavor has admirable precedent. During the Cold War, computer scientists used multilayered simulations to predict the scale and potential outcome of the arms race between the East and the West.

Since then, governments and international agencies have increasingly used computational models and advanced Machine Learning to try to understand recurrent conflict patterns and forecast moments of state fragility.

But two things have transformed the scope for progress in this field.

The first is the sheer volume of data now available from what people say and do online. The second is the game-changing growth in computational capacity that allows us to crunch unprecedented, inconceivable quantities data with relative speed and ease.

So how can this help the United Nations build peace? Three ways come to mind.

Firstly, overcoming cultural and language barriers. By teaching computers to understand human language and the nuances of dialects, not only can we better link up what people write on social media to local contexts of conflict, we can also more methodically follow what people say on radio and TV. As part of the UN’s early warning efforts, this can help us detect hate speech in a place where the potential for conflict is high. This is crucial because the UN often works in countries where internet coverage is low, and where the spoken languages may not be well understood by many of its international staff.

Natural Language Processing algorithms can help to track and improve understanding of local debates, which might well be blind spots for the international community. If we combine such methods with Machine Learning chatbots, the UN could conduct large-scale digital focus groups with thousands in real-time, enabling different demographic segments in a country to voice their views on, say, a proposed peace deal – instantly testing public support, and indicating the chances of sustainability.

Secondly, anticipating the deeper drivers of conflict. We could combine new imaging techniques – whether satellites or drones – with automation. For instance, many parts of the world are experiencing severe groundwater withdrawal and water aquifer depletion. Water scarcity, in turn, drives conflicts and undermines stability in post-conflict environments, where violence around water access becomes more likely, along with large movements of people leaving newly arid areas.

One of the best predictors of water depletion is land subsidence or sinking, which can be measured by satellite and drone imagery. By combining these imaging techniques with Machine Learning, the UN can work in partnership with governments and local communities to anticipate future water conflicts and begin working proactively to reduce their likelihood.

Thirdly, advancing decision making. In the work of peace and security, it is surprising how many consequential decisions are still made solely on the basis of intuition.

Yet complex decisions often need to navigate conflicting goals and undiscovered options, against a landscape of limited information and political preference. This is where we can use Deep Learning – where a network can absorb huge amounts of public data and test it against real-world examples on which it is trained while applying with probabilistic modeling. This mathematical approach can help us to generate models of our uncertain, dynamic world with limited data.

With better data, we can eventually make better predictions to guide complex decisions. Future senior peace envoys charged with mediating a conflict would benefit from such advances to stress test elements of a peace agreement. Of course, human decision-making will remain crucial, but would be informed by more evidence-driven robust analytical tools….(More)”.