We use big data to sentence criminals. But can the algorithms really tell us what we need to know?


 at the Conversation: “In 2013, a man named Eric L. Loomis was sentenced for eluding police and driving a car without the owner’s consent.

When the judge weighed Loomis’ sentence, he considered an array of evidence, including the results of an automated risk assessment tool called COMPAS. Loomis’ COMPAS score indicated he was at a “high risk” of committing new crimes. Considering this prediction, the judge sentenced him to seven years.

Loomis challenged his sentence, arguing it was unfair to use the data-driven score against him. The U.S. Supreme Court now must consider whether to hear his case – and perhaps settle a nationwide debate over whether it’s appropriate for any court to use these tools when sentencing criminals.

Today, judges across the U.S. use risk assessment tools like COMPAS in sentencing decisions. In at least 10 states, these tools are a formal part of the sentencing process. Elsewhere, judges informally refer to them for guidance.

I have studied the legal and scientific bases for risk assessments. The more I investigate the tools, the more my caution about them grows.

The scientific reality is that these risk assessment tools cannot do what advocates claim. The algorithms cannot actually make predictions about future risk for the individual defendants being sentenced….

Algorithms such as COMPAS cannot make predictions about individual defendants, because data-driven risk tools are based on group statistics. This creates an issue that academics sometimes call the “group-to-individual” or G2i problem.

Scientists study groups. But the law sentences the individual. Consider the disconnect between science and the law here.

The algorithms in risk assessment tools commonly assign specific points to different factors. The points are totaled. The total is then often translated to a risk bin, such as low or high risk. Typically, more points means a higher risk of recidivism.

Say a score of 6 points out of 10 on a certain tool is considered “high risk.” In the historical groups studied, perhaps 50 percent of people with a score of 6 points did reoffend.

Thus, one might be inclined to think that a new offender who also scores 6 points is at a 50 percent risk of reoffending. But that would be incorrect.

It may be the case that half of those with a score of 6 in the historical groups studied would later reoffend. However, the tool is unable to select which of the offenders with 6 points will reoffend and which will go on to lead productive lives.

The studies of factors associated with reoffending are not causation studies. They can tell only which factors are correlated with new crimes. Individuals retain some measure of free will to decide to break the law again, or not.

These issues may explain why risk tools often have significant false positive rates. The predictions made by the most popular risk tools for violence and sex offending have been shown to get it wrong for some groups over 50 percent of the time.

A ProPublica investigation found that COMPAS, the tool used in Loomis’ case, is burdened by large error rates. For example, COMPAS failed to predict reoffending in one study at a 37 percent rate. The company that makes COMPAS has disputed the study’s methodology….

There are also a host of thorny issues with risk assessment tools incorporating, either directly or indirectly, sociodemographic variables, such as gender, race and social class. Law professor Anupam Chander has named it the problem of the “racist algorithm.”

Big data may have its allure. But, data-driven tools cannot make the individual predictions that sentencing decisions require. The Supreme Court might helpfully opine on these legal and scientific issues by deciding to hear the Loomis case…(More)”.

Why big-data analysis of police activity is inherently biased


 and  in The Conversation: “In early 2017, Chicago Mayor Rahm Emanuel announced a new initiative in the city’s ongoing battle with violent crime. The most common solutions to this sort of problem involve hiring more police officers or working more closely with community members. But Emanuel declared that the Chicago Police Department would expand its use of software, enabling what is called “predictive policing,” particularly in neighborhoods on the city’s south side.

The Chicago police will use data and computer analysis to identify neighborhoods that are more likely to experience violent crime, assigning additional police patrols in those areas. In addition, the software will identify individual people who are expected to become – but have yet to be – victims or perpetrators of violent crimes. Officers may even be assigned to visit those people to warn them against committing a violent crime.

Any attempt to curb the alarming rate of homicides in Chicago is laudable. But the city’s new effort seems to ignore evidence, including recent research from members of our policing study team at the Human Rights Data Analysis Group, that predictive policing tools reinforce, rather than reimagine, existing police practices. Their expanded use could lead to further targeting of communities or people of color.

Working with available data

At its core, any predictive model or algorithm is a combination of data and a statistical process that seeks to identify patterns in the numbers. This can include looking at police data in hopes of learning about crime trends or recidivism. But a useful outcome depends not only on good mathematical analysis: It also needs good data. That’s where predictive policing often falls short.

Machine-learning algorithms learn to make predictions by analyzing patterns in an initial training data set and then look for similar patterns in new data as they come in. If they learn the wrong signals from the data, the subsequent analysis will be lacking.

This happened with a Google initiative called “Flu Trends,” which was launched in 2008 in hopes of using information about people’s online searches to spot disease outbreaks. Google’s systems would monitor users’ searches and identify locations where many people were researching various flu symptoms. In those places, the program would alert public health authorities that more people were about to come down with the flu.

But the project failed to account for the potential for periodic changes in Google’s own search algorithm. In an early 2012 update, Google modified its search tool to suggest a diagnosis when users searched for terms like “cough” or “fever.” On its own, this change increased the number of searches for flu-related terms. But Google Flu Trends interpreted the data as predicting a flu outbreak twice as big as federal public health officials expected and far larger than what actually happened.

Criminal justice data are biased

The failure of the Google Flu Trends system was a result of one kind of flawed data – information biased by factors other than what was being measured. It’s much harder to identify bias in criminal justice prediction models. In part, this is because police data aren’t collected uniformly, and in part it’s because what data police track reflect longstanding institutional biases along income, race and gender lines….(More)”.

Using big data to understand consumer behaviour on ethical issues


Phani Kumar Chintakayala  and C. William Young in the Journal of Consumer Ethics: “The Consumer Data Research Centre (CDRC) was established by the UK Economic and Social Research Council and launched its data services in 2015. Te project is led by the University of Leeds and UCL, with partners at the Universities of Liverpool and Oxford. It is working with consumer-related organisations and businesses to open up their data resources to trusted researchers, enabling them to carry out important social and economic research….

Over the last few years there has been much talk about how so-called “big data” is the future and if you are not exploiting it, you are losing your competitive advantage. So what is there in the latest wave of enthusiasm on big data to help organisations, researchers and ethical consumers?…

Examples of the types of research being piloted using data from the food sector by CDRC include the consumption of milk and egg products. Te results clearly indicate that not all the sustainable  products are considered the same by consumers, and consumption behaviour varies across sustainable product categories. i) A linked data analysis was carried out by combining sales data of organic milk and free range eggs from a retailer with over 300 stores across the UK, green and ethical atitude data from CDRC’s data partner, and socio-demographic and deprivation data from open sources. Te analysis revealed that, in general, the consumers with deeper green and ethical atitudes are the most likely consumers of sustainable products. Deprivation has a negative efect on the consumption of sustainable products. Price, as expected, has a negative efect but the impact varies across products. Convenience stores have signifcant negative efect on the consumption of sustainable products. Te infuences of socio-demographic characteristics such as gender, age, ethnicity etc. seem to vary by product categories….

Big data can help organisations, researchers and ethical consumers understand the ethics around consumer behaviour and products. Te opportunities to link diferent types of data is exciting but must be research-question-led to avoid digging for non-existent causal links. Te methods and access to data is still a barrier but open access is key to solving this. Big data will probably only help in flling in the details of our knowledge on ethical consumption and on products, but this can only help our decision making…(More)”.

A How-to Book for Wielding Civic Power


Interview by David Bornstein at the New York Times: “Last year, the RAND Survey Research Group asked 3,037 Americans about their political preferences and found that the factor that best predicted support for Donald Trump wasn’t age, race, gender, income, educational attainment or attitudes toward Muslims or undocumented immigrants. It was whether respondents agreed with the statement “People like me don’t have any say about what the government does.”

A feeling of disenfranchisement, or powerlessness, runs deep in the country — and it’s understandable. For most Americans, wages have been flat for 40 years, while incomes have soared for the superrich. Researchers have found, unsurprisingly, that the preferences of wealthy people have a much bigger influence on policy than those of poor or middle-income people.

“I don’t think people are wrong to feel that the game has been rigged,” says Eric Liu, the author of “You’re More Powerful Than You Think: A Citizen’s Guide to Making Change Happen,” an engaging and extremely timely book published last week. “But we’re in a period where across the political spectrum — from the libertarian Tea Party right to the Occupy and Black Lives Matter left — people are pushing back and recognizing that the only remedy is to convert this feeling of ‘not having a say’ into ‘demanding a say.’ ”

Liu, who founded Citizen University, a nonprofit citizen participation organization in Seattle, teaches citizens to do just that. He has also traveled the country, searching across the partisan divide for places where citizens are making democracy work better. In his new book, he has assembled stories of citizen action and distilled them into powerful insights and strategies….

Can you explain the three “core laws of power” you outline in the book?

L. No. 1: Power compounds, as does powerlessness. The rich get richer, and people with clout get more clout.

No. 2: Power justifies itself. In a hundred different ways — propaganda, conventional wisdom, just-so stories — people at the top of the hierarchy tell narratives about why it should be so.

If the world stopped with laws No. 1 and 2, we would be stuck in this doom loop that would tip us toward monopoly and tyranny.

What saves us is law No. 3: Power is infinite. I don’t mean we are all equally powerful. I mean simply and quite literally that we can generate power out of thin air. We do that by organizing….(More)”

Big Data and the Well-Being of Women and Girls: Applications on the Social Scientific Frontier


Report by Bapu Vaitla et al for Data2X: “Conventional forms of data—household surveys, national economic accounts, institutional records, and so on—struggle to capture detailed information on the lives of women and girls. The many forms of big data, from geospatial information to digital transaction logs to records of internet activity, can help close the global gender data gap. This report profiles several big data projects that quantify the economic, social, and health status of women and girls…

This report illustrates the potential of big data in filling the global gender data gap. The rise of big data, however, does not mean that traditional sources of data will become less important. On the contrary, the successful implementation of big data approaches requires investment in proven methods of social scientific research, especially for validation and bias correction of big datasets. More broadly, the invisibility of women and girls in national and international data systems is a political, not solely a technical, problem. In the best case, the current “data revolution” will be reimagined as a step towards better “data governance”: a process through which novel types of information catalyze the creation of new partnerships to advocate for scientific, policy, and political reforms that include women and girls in all spheres of social and economic life….(More)”.

Did artificial intelligence deny you credit?


 in The Conversation: “People who apply for a loan from a bank or credit card company, and are turned down, are owed an explanation of why that happened. It’s a good idea – because it can help teach people how to repair their damaged credit – and it’s a federal law, the Equal Credit Opportunity Act. Getting an answer wasn’t much of a problem in years past, when humans made those decisions. But today, as artificial intelligence systems increasingly assist or replace people making credit decisions, getting those explanations has become much more difficult.

Traditionally, a loan officer who rejected an application could tell a would-be borrower there was a problem with their income level, or employment history, or whatever the issue was. But computerized systems that use complex machine learning models are difficult to explain, even for experts.

Consumer credit decisions are just one way this problem arises. Similar concerns exist in health care, online marketing and even criminal justice. My own interest in this area began when a research group I was part of discovered gender bias in how online ads were targeted, but could not explain why it happened.

All those industries, and many others, who use machine learning to analyze processes and make decisions have a little over a year to get a lot better at explaining how their systems work. In May 2018, the new European Union General Data Protection Regulation takes effect, including a section giving people a right to get an explanation for automated decisions that affect their lives. What shape should these explanations take, and can we actually provide them?

Identifying key reasons

One way to describe why an automated decision came out the way it did is to identify the factors that were most influential in the decision. How much of a credit denial decision was because the applicant didn’t make enough money, or because he had failed to repay loans in the past?

My research group at Carnegie Mellon University, including PhD student Shayak Sen and then-postdoc Yair Zick created a way to measure the relative influence of each factor. We call it the Quantitative Input Influence.

In addition to giving better understanding of an individual decision, the measurement can also shed light on a group of decisions: Did an algorithm deny credit primarily because of financial concerns, such as how much an applicant already owes on other debts? Or was the applicant’s ZIP code more important – suggesting more basic demographics such as race might have come into play?…(More)”

Unconscious gender bias in the Google algorithm


Interview in Metode with Londa Schiebinger, director of Gendered Innovations: “We were interested, because the methods of sex and gender analysis are not in the university curriculum, yet it is very important. The first thing our group did was to develop those methods and we present twelve methods on the website. We knew it would be very important to create case studies or concrete examples where sex and gender analysis added something new to the research. One of my favorite examples is machine translation. If you look at Google Translate, which is the main one in the United States – SYSTRAN is the main one in Europe – we found that it defaults the masculine pronoun. So does SYSTRAN. If I put an article about myself into Google Translate, it defaults to «he said» instead of «she said». So, in an article of one of my visits to Spain, it defaults to «he thinks, he says…» and, occasionally, «it wrote». We wondered why this happened and we found out, because Google Translate works on an algorithm, the problem is that «he said» appears on the web four times more than «she said», so the machine gets it right if it chooses «he said». Because the algorithm is just set up for that. But, anyway, we found that there was a huge change in English language from 1968 to the current time, and the proportion of «he said» and «she said» changed from 4-to-1 to 2-to-1. But, still, the translation does not take this into account. So we went to Google and we said «Hey, what is going on?» and they said «Oh, wow, we didn’t know, we had no idea!». So what we recognized is that there is an unconscious gender bias in the Google algorithm. They did not intend to do this at all, so now there are a lot of people who are trying to fix it….

How can you fix that?

Oh, well, this is the thing! …I think algorithms in general are a problem because if there is any kind of unconscious bias in the data, the algorithm just returns that to you. So even though Google has policies, company policies, to support gender equality, they had an unconscious bias in their product and they do not mean to. Now that they know about it, they can try to fix it….(More)”

Social Media and the Internet of Things towards Data-Driven Policymaking in the Arab World: Potential, Limits and Concerns


Paper by Fadi Salem: “The influence of social media has continued to grow globally over the past decade. During 2016 social media played a highly influential role in what has been described as a “post truth” era in policymaking, diplomacy and political communication. For example, social media “bots” arguably played a key role in influencing public opinion globally, whether on the political or public policy levels. Such practices rely heavily on big data analytics, artificial intelligence and machine learning algorithms, not just in gathering and crunching public views and sentiments, but more so in pro-actively influencing public opinions, decisions and behaviors. Some of these government practices undermined traditional information mediums, triggered foreign policy crises, impacted political communication and disrupted established policy formulation cycles.

On the other hand, the digital revolution has expanded the horizon of possibilities for development, governance and policymaking. A new disruptive transformation is characterized by a fusion of inter-connected technologies where the digital, physical and biological worlds converge. This inter-connectivity is generating — and consuming — an enormous amount of data that is changing the ways policies are conducted, decisions are taken and day-to-day operations are carried out. Within this context, ‘big data’ applications are increasingly becoming critical elements of policymaking. Coupled with the rise of a critical mass of social media users globally, this ubiquitous connectivity and data revolution is promising major transformations in modes of governance, policymaking and citizen-government interaction.

In the Arab region, observations from public sector and decision-making organization suggest that there is limited understanding of the real potential, the limitations, and the public concerns surrounding these big data sources in the Arab region. This report contextualizes the findings in light of the socio-technical transformations taking place in the Arab region, by exploring the growth of social media and building on past editions in the series. The objective is to explore and assess multiple aspects of the ongoing digital transformation in the Arab world and highlight some of the policy implications on a regional level. More specifically, the report aims to better inform our understanding of the convergence of social media and IoT data as sources of big data and their potential impact on policymaking and governance in the region. Ultimately, in light of the availability of massive amount of data from physical objects and people, the questions tackled in the research are: What is the potential for data-driven policymaking and governance in the region? What are the limitations? And most importantly, what are the public concerns that need to be addressed by policymakers while they embark on next phase of the digital governance transformation in the region?

In the Arab region, there are already numerous experiments and applications where data from social media and the “Internet of Things” (IoT) are informing and influencing government practices as sources of big data, effectively changing how societies and governments interact. The report has two main parts. In the first part, we explore the questions discussed in the previous paragraphs through a regional survey spanning the 22 Arab countries. In the second part, it explores growth and usage trends of influential social media platforms across the region, including Facebook, Twitter, Linkedin and, for the first time, Instagram. The findings highlight important changes — and some stagnation — in the ways social media is infiltrating demographic layers in Arab societies, be it gender, age and language. Together, the findings provide important insights for guiding policymakers, business leaders and development efforts. More specifically, these findings can contribute to shaping directions and informing decisions on the future of governance and development in the Arab region….(More)”

Billboard coughs when it detects cigarette smoke


Springwise: “The World Health Organization reports that tobacco use kills approximately six million people each year. And despite having one of the lowest smoking rates in Europe, Sweden’s Apotek Hjartat pharmacy is running a quit smoking campaign to help smokers make good on New Year resolutions. Located in Stockholm’s busy Odenplan square, the campaign billboard features a black and white image of a man.

When the integrated smoke detector identifies smoke, the man in the billboard image comes to life, emitting a sharp, hacking cough. So far, reactions from smokers have been mixed, with non-smokers and smokers alike appreciating the novelty and surprise of the billboard.

Apotek Hjartat is not new to Springwise, having been featured last year with its virtual reality pain relief app. Pharmacies appear to be taking their role of providing a positive public service seriously, with one in New York charging a man tax to highlight the persistent gender wage gap….(More)”

Can artificial intelligence wipe out bias unconscious bias from your workplace?


Lydia Dishman at Fast Company: “Unconscious bias is exactly what it sounds like: The associations we make whenever we face a decision are buried so deep (literally—the gland responsible for this, the amygdala, is surrounded by the brain’s gray matter) that we’re as unaware of them as we are of having to breathe.

So it’s not much of a surprise that Ilit Raz, cofounder and CEO of Joonko, a new application that acts as diversity “coach” powered by artificial intelligence, wasn’t even aware at first of the unconscious bias she was facing as a woman in the course of a normal workday. Raz’s experience coming to grips with that informs the way she and her cofounders designed Joonko to work.

The tool joins a crowded field of AI-driven solutions for the workplace, but most of what’s on the market is meant to root out bias in recruiting and hiring. Joonko, by contrast, is setting its sights on illuminating unconscious bias in the types of workplace experiences where few people even think to look for it….

so far, a lot of these resources have been focused on addressing the hiring process. An integral part of the problem, after all, is getting enough diverse candidates in the recruiting pipeline so they can be considered for jobs. Apps like Blendoor hide a candidate’s name, age, employment history, criminal background, and even their photo so employers can focus on qualifications. Interviewing.io’s platform even masks applicants’ voices. Text.io uses AI to parse communications in order to make job postings more gender-neutral. Unitive’s technology also focuses on hiring, with software designed to detect unconscious bias in Applicant Tracking Systems that read resumes and decide which ones to keep or scrap based on certain keywords.

But as Intel recently discovered, hiring diverse talent doesn’t always mean they’ll stick around. And while one 2014 estimate by Margaret Regan, head of the global diversity consultancy FutureWork Institute, found that 20% of large U.S. employers with diversity programs now provide unconscious-bias training—a number that could reach 50% by next year—that training doesn’t always work as intended. The reasons why vary, from companies putting programs on autopilot and expecting them to run themselves, to the simple fact that many employees who are trained ultimately forget what they learned a few days later.

Joonko doesn’t solve these problems. “We didn’t even start with recruiting,” Raz admits. “We started with task management.” She explains that when a company finally hires a diverse candidate, it needs to understand that the best way to retain them is to make sure they feel included and are given the same opportunities as everyone else. That’s where Joonko sees an opening…(More)”.