Book by Scott Edward Bennett: “…explores how public opinion is used to design, monitor and evaluate government programmes in Australia, Canada, New Zealand, and the United Kingdom. Using information collected from the media and from international practitioners in the public opinion field, as well as interviews in each of the 4 countries, the author describes how views of public opinion and governance differ significantly between elites and the general public. Bennett argues that elites generally risk more by allowing the creation of new data, fearing that its analysis may become public and create communications and political problems of various kinds. The book finds evidence that recent conservative governments in several countries are changing their perspective on the use of public opinion, and that conventional public opinion studies are facing challenges from the availability of other kinds of information and new technologies….(More)”
The law and big data
Article by Felin, Teppo, Devins, Caryn, Kauffman, Stuart and Koppl, Roger: “In this article we critically examine the use of Big Data in the legal system. Big Data is driving a trend towards behavioral optimization and “personalized law,” in which legal decisions and rules are optimized for best outcomes and where law is tailored to individual consumers based on analysis of past data. Big Data, however, has serious limitations and dangers when applied in the legal context. Advocates of Big Data make theoretically problematic assumptions about the objectivity of data and scientific observation. Law is always theory-laden. Although Big Data strives to be objective, law and data have multiple possible meanings and uses and thus require theory and interpretation in order to be applied. Further, the meanings and uses of law and data are indefinite and continually evolving in ways that cannot be captured or predicted by Big Data.
Due to these limitations, the use of Big Data will likely generate unintended consequences in the legal system. Large-scale use of Big Data will create distortions that adversely influence legal decision-making, causing irrational herding behaviors in the law. The centralized nature of the collection and application of Big Data also poses serious threats to legal evolution and democratic accountability. Furthermore, its focus on behavioral optimization necessarily restricts and even eliminates the local variation and heterogeneity that makes the legal system adaptive. In all, though Big Data has legitimate uses, this article cautions against using Big Data to replace independent legal judgment….(More)”
We use big data to sentence criminals. But can the algorithms really tell us what we need to know?
In 2013, a man named Eric L. Loomis was sentenced for eluding police and driving a car without the owner’s consent.
When the judge weighed Loomis’ sentence, he considered an array of evidence, including the results of an automated risk assessment tool called COMPAS. Loomis’ COMPAS score indicated he was at a “high risk” of committing new crimes. Considering this prediction, the judge sentenced him to seven years.
Loomis challenged his sentence, arguing it was unfair to use the data-driven score against him. The U.S. Supreme Court now must consider whether to hear his case – and perhaps settle a nationwide debate over whether it’s appropriate for any court to use these tools when sentencing criminals.
Today, judges across the U.S. use risk assessment tools like COMPAS in sentencing decisions. In at least 10 states, these tools are a formal part of the sentencing process. Elsewhere, judges informally refer to them for guidance.
I have studied the legal and scientific bases for risk assessments. The more I investigate the tools, the more my caution about them grows.
The scientific reality is that these risk assessment tools cannot do what advocates claim. The algorithms cannot actually make predictions about future risk for the individual defendants being sentenced….
Algorithms such as COMPAS cannot make predictions about individual defendants, because data-driven risk tools are based on group statistics. This creates an issue that academics sometimes call the “group-to-individual” or G2i problem.
Scientists study groups. But the law sentences the individual. Consider the disconnect between science and the law here.
The algorithms in risk assessment tools commonly assign specific points to different factors. The points are totaled. The total is then often translated to a risk bin, such as low or high risk. Typically, more points means a higher risk of recidivism.
Say a score of 6 points out of 10 on a certain tool is considered “high risk.” In the historical groups studied, perhaps 50 percent of people with a score of 6 points did reoffend.
Thus, one might be inclined to think that a new offender who also scores 6 points is at a 50 percent risk of reoffending. But that would be incorrect.
It may be the case that half of those with a score of 6 in the historical groups studied would later reoffend. However, the tool is unable to select which of the offenders with 6 points will reoffend and which will go on to lead productive lives.
The studies of factors associated with reoffending are not causation studies. They can tell only which factors are correlated with new crimes. Individuals retain some measure of free will to decide to break the law again, or not.
These issues may explain why risk tools often have significant false positive rates. The predictions made by the most popular risk tools for violence and sex offending have been shown to get it wrong for some groups over 50 percent of the time.
A ProPublica investigation found that COMPAS, the tool used in Loomis’ case, is burdened by large error rates. For example, COMPAS failed to predict reoffending in one study at a 37 percent rate. The company that makes COMPAS has disputed the study’s methodology….
There are also a host of thorny issues with risk assessment tools incorporating, either directly or indirectly, sociodemographic variables, such as gender, race and social class. Law professor Anupam Chander has named it the problem of the “racist algorithm.”
Big data may have its allure. But, data-driven tools cannot make the individual predictions that sentencing decisions require. The Supreme Court might helpfully opine on these legal and scientific issues by deciding to hear the Loomis case…(More)”.
Mapping the invisible: Street View cars add air pollution sensors
Environment at Google: “There are 1.3 million miles of natural gas distribution pipelines in the U.S. These pipelines exist pretty much everywhere that people do, and when they leak, the escaping methane — the main ingredient in natural gas — is a potent greenhouse gas, with 84 times the short-term warming effect of carbon dioxide. These leaks can be time-consuming to identify and measure using existing technologies. Utilities are required by law to quickly fix any leaks that are deemed a safety threat, but thousands of others can — and often do — go on leaking for months or years.
To help gas utilities, regulators, and others understand the scale of the challenge and help prioritize the most cost-effective solutions, the Environmental Defense Fund (EDF) worked with Joe von Fischer, a scientist at Colorado State University, to develop technology to detect and measure methane concentrations from a moving vehicle. Initial tests were promising, and EDF decided to expand the effort to more locations.
That’s when the organization reached out to Google. The project needed to scale, and we had the infrastructure to make it happen: computing power, secure data storage, and, most important, a fleet of Street View cars. These vehicles, equipped with high-precision GPS, were already driving around pretty much everywhere, capturing 360-degree photos for Google Maps; maybe they could measure methane while they were at it. The hypothesis, says Karin Tuxen-Bettman of Google Earth Outreach, was that “we had the potential to turn our Street View fleet into an environmental sensing platform.”
Street View cars make at least 2 trips around a given area in order to capture good air quality data. An intake tube on the front bumper collects air samples, which are then processed by a methane analyzer in the trunk. Finally, the data is sent to the Google Cloud for analysis and integration into a map showing the size and location of methane leaks. Since the trial began in 2012, EDF has built methane maps for 11 cities and found more than 5,500 leaks. The results range from one leak for every mile driven (sorry, Bostonians) to one every 200 miles (congrats, Indianapolis, for replacing all those corrosive steel and iron pipes with plastic).
This promising start inspired the team to take the next step and explore using Street View cars to measure overall air quality. For years, Google has worked on measuring indoor environmental quality across company offices with Aclima, which builds environmental sensor networks. In 2014, we expanded the partnership to the outside world, equipping several more Street View cars with its ‘Environmental Intelligence’ (Ei) mobile platform, including scientific-grade analyzers and arrays of small-scale, low-cost sensors to measure pollutants, including particulate matter, NO2, CO2 black carbon, and more. The new project began with a pilot in Denver, and we’ll finish mapping cities in 3 regions of California by the end of 2016. And today the system is delivering reliable data that corresponds to the U.S. Environmental Protection Agency’s stationary measurement network….
The project began with a few cars, but Aclima’s mobile platform, which has already produced one of the world’s largest data sets on air quality, could also be expanded via deployment on vehicles like buses and mail trucks, on the way to creating a street-level pollution map. This hyper-local data could help people make more informed choices about things like when to let their kids play outside and which changes to advocate for to make their communities healthier….(More)”.
Europol introduce crowdsourcing to catch child abusers
LeakofNations: “The criminal intelligence branch of the European Union, known as Europol, have started a campaign called #TraceAnObject which uses social media crowdsourcing to detect potentially-identifying objects in material that depicts child abuse….
Investigative crowdsourcing has gained traction in academic and journalistic circles in recent years, but this represents the first case of government bureaus relying on social media people-power to conduct more effective analysis.
Journalists are increasingly relying on a combination of high-end computing to organise terabytes of data and internet cloud hubs that allow a consortium of journalists from around the world to share their analysis of the material. In the Panama Papers scoop the Australian software Nuix was used to analyse, extract, and index documents into an encrypted central hub in which thousands of journalists from 80 countries were able to post their workings and assist others in a forum-type setting. This model was remarkably efficient; over 11.5 million documents, dating back to the 1970’s, were analysed in less than a year.
The website Zooinverse has achieved huge success in creating public participation on academic projects, producing the pioneering game Foldit, where participants play with digital models of proteins. The Oxford University-based organisation has now engaged over 1 million volunteers, and has has significant successes in astronomy, ecology, cell biology, humanities, and climate science.
The most complex investigations still require thousands of hours of straightforward tasks that cannot be computerised. The citizen science website Planet Four studies conditions on Mars, and needs volunteers to compare photographs and detect blotches on Mars’ surface – enabling anyone to feel like Elon Musk, regardless of their educational background.
Child abuse is something that incites anger in most people. Crowdsourcing is an opportunity to take the donkey-work away from slow bureaucratic offices and allow ordinary citizens, many of whom felt powerless to protect children from these vile crimes, to genuinely progress cases that will make children safer.
Zooinverse proves that the public are hungry for this kind of work; the ICIJ project model of a central cloud forum shows that crowdsourcing across international borders allows data to be interpreted more efficiently. Europol’s latest idea could well be a huge success.
Even the most basic object could potentially provide vital clues to the culprit’s identity. The most significant items released so far include a school uniform complete with ID card necktie, and a group of snow-covered lodges….(More) (see also #TraceAnObject).
Big data allows India to map its fight against human trafficking
Nita Bhalla for Reuters: “An Indian charity is using big data to pinpoint human trafficking hot spots in a bid to prevent vulnerable women and girls vanishing from high-risk villages into the sex trade.
My Choices Foundation uses specially designed technology to identify those villages that are most at risk of modern slavery, then launches local campaigns to sound the alarm….
The analytics tool – developed by Australian firm Quantium – uses a range of factors to identify the most dangerous villages.It draws on India’s census, education and health data and factors such as drought risk, poverty levels, education and job opportunities to identify vulnerable areas….
There are an estimated 46 million people enslaved worldwide, with more than 18 million living in India, according to the 2016 Global Slavery Index. The Index was compiled by the Walk Free Foundation, a global organisation seeking to end modern slavery. Many are villagers lured by traffickers with the promise of a good job and an advance payment, only to find themselves or their children forced to work in fields or brick kilns, enslaved in brothels and sold into sexual slavery.
Almost 20,000 women and children were victims of human trafficking in India in 2016, a rise of nearly 25 percent from the previous year, according to government data.While India has strengthened its anti-trafficking policy in recent years, activists say a lack of public awareness remains one of the biggest impediments…(More)”.
Expanding Training on Data and Technology to Improve Communities
Kathryn Pettit at the National Neighborhood Indicators Partnership (NNIP): “Local government and nonprofit staff need data and technology skills to regularly monitor local conditions and design programs that achieve more effective outcomes. Tailored training is essential to help them gain the knowledge and confidence to leverage these indispensable tools. A recent survey of organizations that provide data and technology training documented current practices and how such training should be expanded. Four recommendations are provided to assist government agencies, elected leaders, nonprofit executives, and local funders in empowering workers with the necessary training to use data and technology to benefit their communities. Specifically, community stakeholders should collectively work to
- expand the training available to government and nonprofit staff;
- foster opportunities for sharing training materials and lessons;
- identify allies who can enhance and support local training efforts;
- and assess the local landscape of data and technology training.
Project Products
- Brief: A summary of the current training landscape and key action steps for various sectors to ensure that local government and nonprofit staff have the data and technology skills needed for their civic missions.
- Guide: A document for organizations interested in providing community data and technology training, including advice on how to assess local needs, develop training content, and fund these efforts.
- Catalog: Example training descriptions and related materials collected from various cities for local adaptation.
- Fact sheet: A summary of results from a survey on current training content and practices….(More)”
How Data Mining Facebook Messages Can Reveal Substance Abusers
Emerging Technology from the arXiv: “…Substance abuse is a serious concern. Around one in 10 Americans are sufferers. Which is why it costs the American economy more than $700 billion a year in lost productivity, crime, and health-care costs. So a better way to identify people suffering from the disorder, and those at risk of succumbing to it, would be hugely useful.
Bickel and co say they have developed just such a technique, which allows them to spot sufferers simply by looking at their social media messages such as Facebook posts. The technique even provides new insights into the way abuse of different substances influences people’s social media messages.

The new technique comes from the analysis of data collected between 2007 and 2012 as part of a project that ran on Facebook called myPersonality. Users who signed up were offered various psychometric tests and given feedback on their scores. Many also agreed to allow the data to be used for research purposes.
One of these tests asked over 13,000 users with an average age of 23 about the substances they used. In particular, it asked how often they used tobacco, alcohol, or other drugs, and assessed each participant’s level of use. The users were then divided into groups according to their level of substance abuse.
This data set is important because it acts as a kind of ground truth, recording the exact level of substance use for each person.
The team next gathered two other Facebook-related data sets. The first was 22 million status updates posted by more than 150,000 Facebook users. The other was even larger: the “like” data associated with 11 million Facebook users.
Finally, the team worked out how these data sets overlapped. They found almost 1,000 users who were in all the data sets, just over 1,000 who were in the substance abuse and status update data sets, and 3,500 who were in the substance abuse and likes data sets.
These users with overlapping data sets provide rich pickings for data miners. If people with substance use disorders have certain unique patterns of behavior, it may be possible to spot these in their Facebook status updates or in their patterns of likes.
So Bickel and co got to work first by text mining most of the Facebook status updates and then data mining most of the likes data set. Any patterns they found, they then tested by looking for people with similar patterns in the remaining data and seeing if they also had the same level of substance use.
The results make for interesting reading. The team says its technique was hugely successful. “Our best models achieved 86% for predicting tobacco use, 81% for alcohol use and 84% for drug use, all of which significantly outperformed existing methods,” say Bickel and co…. (More) (Full Paper: arxiv.org/abs/1705.05633: Social Media-based Substance Use Prediction).
Data Journalism: How Not To Be Wrong
Winny de Jong: “At the intersection of data and journalism, lots can go wrong. Merely taking precautions might not be enough….
Half True Is False
But this approach is not totally foolproof.
“In data journalism, we cannot settle for ‘half-true.’ Anything short of true is wrong – and we cannot afford to be wrong.” Unlike fact-checking websites such as Politifact, which invented ‘scales’ for truthfulness, from false to true and everything in between, data journalism should always be true.

No Pants on Fire: Politifact’s Truth-O-Meter.
True but Wrong
But even when your story is true, Gebeloff said you still could still be wrong. “You can do the math correctly, but get the context wrong, fail to acknowledge uncertainties or not describe your findings correctly.”
Fancy Math
When working on a story, journalists should consider whether they use “fancy math” – think statistics – or “standard math.” “Using fancy math you can explore complex relationships, but at the same time your story will be harder to explain.”…
Targets as a Source
…To make sure you’re not going to be wrong, you should share your findings. “Don’t just share findings with experts, share them with hostile experts too,” Gebeloff advises. “Use your targets as a source. If there’s a blowback, you want to know before publication – and include the blowback in the publication.”
How Not To Be Wrong Checklist
Here’s why you want to use this checklist, which is based on Gebeloff’s presentation: a half truth is false, and data journalism should always be true. But just being true is not enough. Your story can be mathematically true but wrong in context or explanation. You should want your stories to be true and not wrong.
- Check your data carefully:
- Pay attention to dates.
- Check for spelling and duplicates.
- Identify outliers.
- Statistical significance alone is not news.
- Prevent base year abuse: if something is a trend, it should be true in general not just if you cherrypick a base year.
- Make sure your data represents reality.
- As you work, keep a data diary that records what you’ve done and how you’ve done it. You should be able to reproduce your calculations.
- Make sure you explain the methods you used – your audience should be able to understand how you find a story.
- Play offense and defense simultaneously. Go for the maximum possible story, but at all times think of why you might be wrong, or what your target would say in response.
- Use your targets as a source to find blowbacks before publication.
- As part of the proofing process, create a footnotes file. Identify each fact and give it a number. Then, for each fact, list which document it came from, how you know it and the proof. Fix what needs to be fixed.
Additional links of interest: the slides of Robert Gebeloff’s how not to be wrong presentation, and the methodology notes and data from the “Race Behind Bars” series….(More)”
Routledge Handbook on Information Technology in Government
Book edited by Yu-Che Chen and Michael J. Ahn: “The explosive growth in information technology has ushered in unparalleled new opportunities for advancing public service. Featuring 24 chapters from foremost experts in the field of digital government, this Handbook provides an authoritative survey of key emerging technologies, their current state of development and use in government, and insightful discussions on how they are reshaping and influencing the future of public administration. This Handbook explores:
- Key emerging technologies (i.e., big data, social media, Internet of Things (IOT), GIS, smart phones & mobile technologies) and their impacts on public administration
- The impacts of the new technologies on the relationships between citizens and their governments with the focus on collaborative governance
- Key theories of IT innovations in government on the interplay between technological innovations and public administration
- The relationship between technology and democratic accountability and the various ways of harnessing the new technologies to advance public value
- Key strategies and conditions for fostering success in leveraging technological innovations for public service
This Handbook will prove to be an invaluable guide and resource for students, scholars and practitioners interested in this growing field of technological innovations in government….(More)”.