Why big-data analysis of police activity is inherently biased

 and  in The Conversation: “In early 2017, Chicago Mayor Rahm Emanuel announced a new initiative in the city’s ongoing battle with violent crime. The most common solutions to this sort of problem involve hiring more police officers or working more closely with community members. But Emanuel declared that the Chicago Police Department would expand its use of software, enabling what is called “predictive policing,” particularly in neighborhoods on the city’s south side.

The Chicago police will use data and computer analysis to identify neighborhoods that are more likely to experience violent crime, assigning additional police patrols in those areas. In addition, the software will identify individual people who are expected to become – but have yet to be – victims or perpetrators of violent crimes. Officers may even be assigned to visit those people to warn them against committing a violent crime.

Any attempt to curb the alarming rate of homicides in Chicago is laudable. But the city’s new effort seems to ignore evidence, including recent research from members of our policing study team at the Human Rights Data Analysis Group, that predictive policing tools reinforce, rather than reimagine, existing police practices. Their expanded use could lead to further targeting of communities or people of color.

Working with available data

At its core, any predictive model or algorithm is a combination of data and a statistical process that seeks to identify patterns in the numbers. This can include looking at police data in hopes of learning about crime trends or recidivism. But a useful outcome depends not only on good mathematical analysis: It also needs good data. That’s where predictive policing often falls short.

Machine-learning algorithms learn to make predictions by analyzing patterns in an initial training data set and then look for similar patterns in new data as they come in. If they learn the wrong signals from the data, the subsequent analysis will be lacking.

This happened with a Google initiative called “Flu Trends,” which was launched in 2008 in hopes of using information about people’s online searches to spot disease outbreaks. Google’s systems would monitor users’ searches and identify locations where many people were researching various flu symptoms. In those places, the program would alert public health authorities that more people were about to come down with the flu.

But the project failed to account for the potential for periodic changes in Google’s own search algorithm. In an early 2012 update, Google modified its search tool to suggest a diagnosis when users searched for terms like “cough” or “fever.” On its own, this change increased the number of searches for flu-related terms. But Google Flu Trends interpreted the data as predicting a flu outbreak twice as big as federal public health officials expected and far larger than what actually happened.

Criminal justice data are biased

The failure of the Google Flu Trends system was a result of one kind of flawed data – information biased by factors other than what was being measured. It’s much harder to identify bias in criminal justice prediction models. In part, this is because police data aren’t collected uniformly, and in part it’s because what data police track reflect longstanding institutional biases along income, race and gender lines….(More)”.