Does Crime-Predicting Software Bias Judges? Unfortunately, There’s No Data

Rose Eveleth at Motherboard: “For centuries judges have had to make guesses about the people in front of them.Will this person commit a crime again? Or is this punishment enough to deter them?Do they have the support they need at home to stay safe and healthy and away from crime? Or will they be thrust back into a situation that drives them to their old ways? Ultimately, judges have to guess.

But recently, judges in states including California and Florida have been given a new piece of information to aid in that guess work: a “risk assessment score” determined by an algorithm. These algorithms take a whole suite of variables into account, and spit out a number (usually between 1 and 10) that estimates the risk that the person in question will wind up back in jail.

If you’ve read this column before, you probably know where this is going. Algorithms aren’t unbiased, and a recent ProPublica investigation suggests what researchers have long been worried about: that these algorithms might contain latent racial prejudice. According to ProPublica’s evaluation of a particular scoring method called the COMPAS system, which was created by a company called Northpointe, people of color are more likely to get higher scores than white people for essentially the same crimes.

Bias against folks of color isn’t a new phenomenon in the judicial system. (This might be the understatement of the year.) There’s a huge body of research that shows that judges, like all humans, are biased. Plenty of studies have shown that for the same crime, judges are more likely to sentence a black person more harshly than a white person. It’s important to question biases of all kinds, both human and algorithmic, but it’s also important to question them in relation to one another. And nobody has done that.

I’ve been doing some research of my own into these recidivism algorithms, and whenI read the ProPublica story, I came out with the same question I’ve had since I started looking into this: these algorithms are likely biased against people of color. But so are judges. So how do they compare? How does the bias present in humans stack up against the bias programmed into algorithms?

This shouldn’t be hard to find out: ideally you would divide judges in a single county in half, and give one half access to a scoring system, and have the other half carry on as usual. If you don’t want to A/B test within a county—and there are some questions about whether that’s an ethical thing to do—then simply compare two counties with similar crime rates, in which one county uses rating systems and the other doesn’t. In either case, it’s essential to test whether these algorithmic recidivism scores exacerbate, reduce, or otherwise change existing bias.

Most of the stories I’ve read about these sentencing algorithms don’t mention any such studies. But I assumed that they existed, they just didn’t make the cut in editing.

I was wrong. As far as I can find, and according to everybody I’ve talked to in the field,nobody has done this work, or anything like it. These scores are being used by judges to help them sentence defendants and nobody knows whether the scores exacerbate existing racial bias or not….(More)”