What Algorithmic Injustice Looks Like in Real Life


Julia Angwin, Jeff Larson, Surya Mattu & Lauren Kirchner at Pacific Standard: “Courtrooms across the nation are using computer programs to predict who will be a future criminal. The programs help inform decisions on everything from bail to sentencing. They are meant to make the criminal justice system fairer — and to weed out human biases.

ProPublica tested one such program and found that it’s often wrong — and biased against blacks.

We looked at the risk scores the program spit out for more than 7,000 people arrested in Broward County, Florida, in 2013 and 2014. We checked to see how many defendants were charged with new crimes over the next two years — the same benchmark used by the creators of the algorithm. Our analysis showed:

  • The formula was particularly likely to falsely flag black defendants as future criminals, wrongly labeling them this way at almost twice the rate as white defendants.
  • White defendants were mislabeled as low risk more often than black defendants.

What does that look like in real life? Here are five comparisons of defendants — one black and one white — who were charged with similar offenses but got very different scores.

Two Shoplifting Arrests

James Rivelli, 53: In August 2014, Rivelli allegedly shoplifted seven boxes of Crest Whitestrips from a CVS. An employee called the police. When the cops found Rivelli and pulled him over, they found the Whitestrips as well as heroin and drug paraphernalia in his car. He was charged with two felony counts and four misdemeanors for grand theft, drug possession, and driving with a suspended license and expired tags.

Past offenses: He had been charged with felony aggravated assault for domestic violence in 1996, felony grand theft also in 1996, and a misdemeanor theft in 1998. He also says that he was incarcerated in Massachusetts for felony drug trafficking.

COMPAS score: 3 — low

Subsequent offense: In April 2015, he was charged with two felony counts of grand theft in the 3rd degree for shoplifting about $1,000 worth of tools from a Home Depot.

He says: Rivelli says his crimes were fueled by drug use and he is now sober. “I’m surprised [my risk score] is so low,” Rivelli said in an interview in his mother’s apartment in April. “I spent five years in state prison in Massachusetts.”…(More)

Moneyballing Criminal Justice


Anne Milgram in the Atlantic: “…One area in which the potential of data analysis is still not adequately realized,however, is criminal justice. This is somewhat surprising given the success of CompStat, a law enforcement management tool that uses data to figure out how police resources can be used to reduce crime and hold law enforcement officials accountable for results. CompStat is widely credited with contributing to New York City’s dramatic reduction in serious crime over the past two decades. Yet data-driven decision-making has not expanded to the whole of the criminal justice system.

But it could. And, in this respect, the front end of the system — the part of the process that runs from arrest through sentencing — is particularly important. Atthis stage, police, prosecutors, defenders, and courts make key choices about how to deal with offenders — choices that, taken together, have an enormous impact on crime. Yet most jurisdictions do not collect or analyze the data necessary to know whether these decisions are being made in a way that accomplishes the most important goals of the criminal justice system: increased public safety,decreased recidivism, reduced cost, and the fair, efficient administration of justice.

Even in jurisdictions where good data exists, a lack of technology is often an obstacle to using it effectively. Police, jails, courts, district attorneys, and public defenders each keep separate information systems, the data from which is almost never pulled together and analyzed in a way that could answer the questions that matter most: Who is in our criminal justice system? What crimes have been charged? What risks do individual offenders pose? And which option would best protect the public and make the best use of our limited resources?

While debates about prison over-crowding, three strikes laws, and mandatory minimum sentences have captured public attention, the importance of what happens between arrest and sentencing has gone largely unnoticed. Even though I ran the criminal justice system in New Jersey, one of the largest states in the country, I had not realized the magnitude of the pretrial issues until I was tasked by theLaura and John Arnold Foundation with figuring out which aspects of criminal justice had the most need and presented the greatest opportunity for reform….

Technology could help us leverage data to identify offenders who will pose unacceptable risks to society if they are not behind bars and distinguish them from those defendants who will have lower recidivism rates if they are supervised in the community or given alternatives to incarceration before trial. Likewise, it could help us figure out which terms of imprisonment, alternatives to incarceration, and other interventions work best–and for whom. And the list does not end there.

The truth is our criminal justice system already makes these decisions every day.But it makes them without knowing whether they’re the right ones. That needs to change. If data is powerful enough to transform baseball, health care, and education, it can do the same for criminal justice….(More)”

…(More).

Big Risks, Big Opportunities: the Intersection of Big Data and Civil Rights


Latest White House report on Big Data charts pathways for fairness and opportunity but also cautions against re-encoding bias and discrimination into algorithmic systems: ” Advertisements tailored to reflect previous purchasing decisions; targeted job postings based on your degree and social networks; reams of data informing predictions around college admissions and financial aid. Need a loan? There’s an app for that.

As technology advances and our economic, social, and civic lives become increasingly digital, we are faced with ethical questions of great consequence. Big data and associated technologies create enormous new opportunities to revisit assumptions and instead make data-driven decisions. Properly harnessed, big data can be a tool for overcoming longstanding bias and rooting out discrimination.

The era of big data is also full of risk. The algorithmic systems that turn data into information are not infallible—they rely on the imperfect inputs, logic, probability, and people who design them. Predictors of success can become barriers to entry; careful marketing can be rooted in stereotype. Without deliberate care, these innovations can easily hardwire discrimination, reinforce bias, and mask opportunity.

Because technological innovation presents both great opportunity and great risk, the White House has released several reports on “big data” intended to prompt conversation and advance these important issues. The topics of previous reports on data analytics included privacy, prices in the marketplace, and consumer protection laws. Today, we are announcing the latest report on big data, one centered on algorithmic systems, opportunity, and civil rights.

The first big data report warned of “the potential of encoding discrimination in automated decisions”—that is, discrimination may “be the inadvertent outcome of the way big data technologies are structured and used.” A commitment to understanding these risks and harnessing technology for good prompted us to specifically examine the intersection between big data and civil rights.

Using case studies on credit lending, employment, higher education, and criminal justice, the report we are releasing today illustrates how big data techniques can be used to detect bias and prevent discrimination. It also demonstrates the risks involved, particularly how technologies can deliberately or inadvertently perpetuate, exacerbate, or mask discrimination.

The purpose of the report is not to offer remedies to the issues it raises, but rather to identify these issues and prompt conversation, research—and action—among technologists, academics, policy makers, and citizens, alike.

The report includes a number of recommendations for advancing work in this nascent field of data and ethics. These include investing in research, broadening and diversifying technical leadership, cross-training, and expanded literacy on data discrimination, bolstering accountability, and creating standards for use within both the government and the private sector. It also calls on computer and data science programs and professionals to promote fairness and opportunity as part of an overall commitment to the responsible and ethical use of data.

Big data is here to stay; the question is how it will be used: to advance civil rights and opportunity, or to undermine them….(More)”

Your Data Footprint Is Affecting Your Life In Ways You Can’t Even Imagine


Jessica Leber at Fast Co-Exist: “Cities have long seen the potential in big data to improve the government and the lives of citizens, and this is now being put into action in ways where governments touch citizens’ lives in very sensitive areas. New York City’s Department of Homelessness Services is mining apartment eviction filings, to see if they can understand who is at risk of becoming homeless and intervene early. And police departments all over the country have adopted predictive policing software that guides where officers should deploy, and at what time, leading to reduced crime in some cities.

In one study in Los Angeles, police officers deployed to certain neighborhoods by predictive policing software prevented 4.3 crimes per week, compared to 2 crimes per week when assigned to patrol a specific area by human crime analysts. Surely, a reduction in crime is a good thing. But community activists in places such as Bellingham, Washington, have grave doubts. They worry that outsiders can’t examine how the algorithms work, since the software is usually proprietary, and so citizens have no way of knowing what data the government is using to target them. They also worry that predictive policing is just exacerbating existing patterns of racial profiling. If the underlying crime data being used is the result of years of over-policing minority communities for minor offenses, then the predictions based on this biased data could create a feedback loop and lead to yet more over-policing.

At a smaller and more limited scale is the even more sensitive area of child protection services. Though the data isn’t really as “big” as in other examples, a few agencies are carefully exploring using statistical models to make decisions in several areas, such as which children in the system are most in danger of violence, which children are most in need of a trauma screening, and which are at risk of entering the criminal justice system. 

In Hillsborough County, Florida, where a series of child homicides occurred, a private provider selected to manage the county’s child welfare system in 2012 came in and analyzed the data. Cases with the highest probability of serious injury or death had a few factors in common, they found: a child under the age of three, a “paramour” in the home, a substance abuse or domestic violence history, and a parent previously in the foster care system. They identified nine practices to use in these cases and hired a software provider to create a dashboard that allowed real-time feedback and dashboards. Their success has led to the program being implemented statewide….

“I think the opportunity is a rich one. At the same time, the ethical considerations need to be guiding us,” says Jesse Russell, chief program officer at the National Council on Crime and Delinquency, who has followed the use of predictive analytics in child protective services. Officials, he says, are treading carefully before using data to make decisions about individuals, especially when the consequences of being wrong—such as taking a child out of his or her home unnecessarily—are huge. And while caseworker decision-making can be flawed or biased, so can the programs that humans design. When you rely too much on data—if the data is flawed or incomplete, as could be the case in predictive policing—you risk further validating bad decisions or existing biases….

On the other hand, big data does have the potential to vastly expand our understanding of who we are and why we do what we do. A decade ago, serious scientists would have laughed someone out of the room who proposed a study of “the human condition.” It is a topic so broad and lacking in measurability. But perhaps the most important manifestation of big data in people’s lives could come from the ability for scientists to study huge, unwieldy questions they couldn’t before.

A massive scientific undertaking to study the human condition is set to launch in January of 2017. The Kavli Human Project, funded by the Kavli Foundation, plans to recruit 10,000 New Yorkers from all walks of life to be measured for 10 years. And by measured, they mean everything: all financial transactions, tax returns, GPS coordinates, genomes, chemical exposure, IQ, bluetooth sensors around the house, who subjects text and call—and that’s just the beginning. In all, the large team of academics expect to collect about a billion data points per person per year at an unprecedented low cost for each data point compared to other large research surveys.

The hope is with so much continuous data, researchers can for the first time start to disentangle the complex, seemingly unanswerable questions that have plagued our society, from what is causing the obesity epidemic to how to disrupt the poverty to prison cycle….(More)

Do Universities, Research Institutions Hold the Key to Open Data’s Next Chapter


Ben Miller at Government Technology: “Government produces a lot of data — reams of it, roomfuls of it, rivers of it. It comes in from citizen-submitted forms, fleet vehicles, roadway sensors and traffic lights. It comes from utilities, body cameras and smartphones. It fills up servers and spills into the cloud. It’s everywhere.

And often, all that data sits there not doing much. A governing entity might have robust data collection and it might have an open data policy, but that doesn’t mean it has the computing power, expertise or human capital to turn those efforts into value.

The amount of data available to government and the computing public promises to continue to multiply — the growing smart cities trend, for example, installs networks of sensors on everything from utility poles to garbage bins.

As all this happens, a movement — a new spin on an old concept — has begun to take root: partnerships between government and research institutes. Usually housed within universities and laboratories, these partnerships aim to match strength with strength. Where government has raw data, professors and researchers have expertise and analytics programs.

Several leaders in such partnerships, spanning some of the most tech-savvy cities in the country, see increasing momentum toward the concept. For instance, the John D. and Catherine T. MacArthur Foundation in September helped launch the MetroLab Network, an organization of more than 20 cities that have partnered with local universities and research institutes for smart-city-oriented projects….

Two recurring themes in projects that universities and research organizations take on in cooperation with government are project evaluation and impact analysis. That’s at least partially driven by the very nature of the open data movement: One reason to open data is to get a better idea of how well the government is operating….

Open data may have been part of the impetus for city-university partnerships, in that the availability of more data lured researchers wanting to work with it and extract value. But those partnerships have, in turn, led to government officials opening more data than ever before for useful applications.

Sort of.

“I think what you’re seeing is not just open data, but kind of shades of open — the desire to make the data open to university researchers, but not necessarily the broader public,” said Beth Noveck, co-founder of New York University’s GovLab.


shipping+crates

GOVLAB: DOCKER FOR DATA 

Much of what GovLab does is about opening up access to data, and that is the whole point of Docker for Data. The project aims to simplify and quicken the process of extracting and loading large data sets so they will respond to Structured Query Language commands by moving the computing power of that process to the cloud. The docker can be installed with a single line of code, and its website plays host to already-extracted data sets. Since its inception, the website has grown to include more than 100 gigabytes of data from more than 8,000 data sets. From Baltimore, for example, one can easily find information on public health, water sampling, arrests, senior centers and more. Photo via Shutterstock.


That’s partially because researchers are a controlled group who can be forced to sign memorandums of understanding and trained to protect privacy and prevent security breaches when government hands over sensitive data. That’s a top concern of agencies that manage data, and it shows in the GovLab’s work.

It was something Noveck found to be very clear when she started working on a project she simply calls “Arnold” because of project support from the Laura and John Arnold Foundation. The project involves building a better understanding of how different criminal justice jurisdictions collect, store and share data. The motivation is to help bridge the gaps between people who manage the data and people who should have easy access to it. When Noveck’s center conducted a survey among criminal justice record-keepers, the researchers found big differences between participants.

“There’s an incredible disparity of practices that range from some jurisdictions that have a very well established, formalized [memorandum of understanding] process for getting access to data, to just — you send an email to a guy and you hope that he responds, and there’s no organized way to gain access to data, not just between [researchers] and government entities, but between government entities,” she said….(More)

UK police force trials virtual crime visits over Skype


Nick Summers at Engadget: In an effort to cut costs and make its officers more efficient, police in Peterborough, England are asking citizens to report their crimes over Skype. So, whereas before a local “bobby” would come round to their house, notepad in hand, to ask questions and take down what happened, the entire process will now be conducted over webcam. Alternatively, victims can do the follow-up on the phone or at the station — handy if Skype is being its usual, unreliable self. The system is being trialled for crimes reported via 101, the police’s non-emergency contact number. The force says it’ll give people more flexibility with appointment times, and also ensure officers spend more hours each day on patrol. We suspect it also has something to do with the major budget cuts facing forces up and down the country….(More)”

As a Start to NYC Prison Reform, Jail Data Will Be Made Public


Brentin Mock at CityLab: “…In New York City, 40 percent of the jailed population are there because they couldn’t afford bail—most of them for nonviolent drug crimes. The city spends $42 million on average annually incarcerating non-felony defendants….

Wednesday, NYC Mayor Bill de Blasio signed into law legislation aimed at helping correct these bail problems, providing inmates a bill of rights for when they’re detained and addressing other problems that lead to overstuffing city jails with poor people of color.

The omnibus package of criminal justice reform bills will require the city to produce better accounting of how many people are in city jails, what they’re average incarceration time is while waiting for trial, the average bail amounts imposed on defendants, and a whole host of other data points on incarceration. Under the new legislation, the city will have to release reports quarterly and semi-annually to the public—much of it from data now sheltered within the city’s Department of Corrections.

“This is bringing sunshine to information that is already being looked at internally, but is better off being public data,” New York City council member Helen Rosenthal tells CityLab. “We can better understand what polices we need to change if we have the data to understand what’s going on in the system.”…

The city passed a package of transparency bills last month that focused on Rikers, but the legislation passed Wednesday will focus on the city’s courts and jails system as a whole….(More)”

Beyond the Jailhouse Cell: How Data Can Inform Fairer Justice Policies


Alexis Farmer at DataDrivenDetroit: “Government-provided open data is a value-added approach to providing transparency, analytic insights for government efficiency, innovative solutions for products and services, and increased civic participation. Two of the least transparent public institutions are jails and prisons. The majority of population has limited knowledge about jail and prison operations and the demographics of the jail and prison population, even though the costs of incarceration are substantial. The absence of public knowledge about one of the many establishments public tax dollars support can be resolved with an open data approach to criminal justice. Increasing access to administrative jail information enables communities to collectively and effectively find solutions to the challenges the system faces….

The data analysis that compliments open data practices is a part of the formula for creating transformational policies. There are numerous ways that recording and publishing data about jail operations can inform better policies and practices:

1. Better budgeting and allocation of funds. By monitoring the rate at which dollars are expended for a specific function, data allows for administrations to ensure accurate estimates of future expenditures.

2. More effective deployment of staff. Knowing the average daily population and annual average bookings can help inform staffing decisions to determine a total need of officers, shift responsibilities, and room arrangements. The population information also helps with facility planning, reducing overcrowding, controlling violence within the facility, staffing, determining appropriate programs and services, and policy and procedure development.

3. Program participation and effectiveness. Gauging the amount of inmates involved in jail work programs, educational training services, rehabilitation/detox programs, and the like is critical to evaluating methods to improve and expand such services. Quantifying participation and effectiveness of these programs can potentially lead to a shift in jail rehabilitating services.

4. Jail suicides. “The rate of jail suicides is about three times the rate of prison suicides.” Jails are isolating spaces that separate inmates from social support networks, diminish personal control, and often lack mental health resources. Most people in jail face minor charges and spend less time incarcerated due to shorter sentences. Reviewing the previous jail suicide statistics aids in pinpointing suicide risk, identifying high-risk groups, and ultimately, prescribing intervention procedures and best practices to end jail suicides.

5. Gender and race inequities. It is well known that Black men are disproportionately incarcerated, and the number of Black women in jails and prisons has rapidly increased . It is important to view this disparity as it reflects to the demographics of the total population of an area. Providing data that show trends in particular crimes committed by race and gender data might lead to further analysis and policy changes in the root causes of these crimes (poverty, employment, education, housing, etc.).

6. Prior interaction with the juvenile justice system. The school-to-prison pipeline describes the systematic school discipline policies that increase a student’s interaction with the juvenile justice system. Knowing how many incarcerated persons that have been suspended, expelled, or incarcerated as a juvenile can encourage schools to examine their discipline policies and institute more restorative justice programs for students. It would also encourage transitional programs for formerly incarcerated youth in order to decrease recidivism rate among young people.

7. Sentencing reforms. Evaluating the charges on which a person is arrested, the length of stay, average length of sentences, charges for which sentences are given, and the length of time from the first appearance to arraignment and trial disposition can inform more just and balanced sentencing laws enforced by the judicial branch….(More)”

Index: Crime and Criminal Justice Data


The Living Library Index – inspired by the Harper’s Index – provides important statistics and highlights global trends in governance innovation. This installment focuses on crime and criminal justice data and was originally published in 2015.

This index provides information about the type of crime and criminal justice data collected, shared and used in the United States. Because it is well known that data related to the criminal justice system is often times unreliable, or just plain missing, this index also highlights some of the issues that stand in the way of accessing useful and in-demand statistics.

Data Collections: National Crime Statistics

  • Number of incident-based crime datasets created by the Federal Bureau of Investigation (FBI): 2
    • Number of U.S. Statistical Agencies: 13
    • How many of those are focused on criminal justice: 1, the Bureau of Justice Statistics (BJS)
    • Number of data collections focused on criminal justice the BJS produces: 61
    • Number of federal-level APIs available for crime or criminal justice data: 1, the National Crime Victimization Survey (NCVS).
    • Frequency of the NCVS: annually
  • Number of Statistical Analysis Centers (SACs), organizations that are essentially clearinghouses for crime and criminal justice data for each state, the District of Columbia, Puerto Rico and the Northern Mariana Islands: 53

Open data, data use and the impact of those efforts

  • Number of datasets that are returned when “criminal justice” is searched for on Data.gov: 417, including federal-, state- and city-level datasets
  • Number of datasets that are returned when “crime” is searched for on Data.gov: 281
  • The percentage that public complaints dropped after officers started wearing body cameras, according to a study done in Rialto, Calif.: 88
  • The percentage that reported incidents of officer use of force fell after officers started wearing body cameras, according to a study done in Rialto, Calif.: 5
  • The percent that crime decreased during an experiment in predictive policing in Shreveport, LA: 35  
  • Number of crime data sets made available by the Seattle Police Department – generally seen as a leader in police data innovation – on the Seattle.gov website: 4
    • Major crime stats by category in aggregate
    • Crime trend reports
    • Precinct data by beat
    • State sex offender database
  • Number of datasets mapped by the Seattle Police Department: 2:
      • 911 incidents
    • Police reports
  • Number of states where risk assessment tools must be used in pretrial proceedings to help determine whether an offender is released from jail before a trial: at least 11.

Police Data

    • Number of federally mandated databases that collect information about officer use of force or officer involved shootings, nationwide: 0
    • The year a crime bill was passed that called for data on excessive force to be collected for research and statistical purposes, but has never been funded: 1994
    • Number of police departments that committed to being a part of the White House’s Police Data Initiative: 21
    • Percentage of police departments surveyed in 2013 by the Office of Community Oriented Policing within the Department of Justice that are not using body cameras, therefore not collecting body camera data: 75

The criminal justice system

  • Parts of the criminal justice system where data about an individual can be created or collected: at least 6
    • Entry into the system (arrest)
    • Prosecution and pretrial
    • Sentencing
    • Corrections
    • Probation/parole
    • Recidivism

Sources

  • Crime Mapper. Philadelphia Police Department. Accessed August 24, 2014.

The New Science of Sentencing


Anna Maria Barry-Jester et al at the Marshall Project: “Criminal sentencing has long been based on the present crime and, sometimes, the defendant’s past criminal record. In Pennsylvania, judges could soon consider a new dimension: the future.

Pennsylvania is on the verge of becoming one of the first states in the country to base criminal sentences not only on what crimes people have been convicted of, but also on whether they are deemed likely to commit additional crimes. As early as next year, judges there could receive statistically derived tools known as risk assessments to help them decide how much prison time — if any — to assign.

Risk assessments have existed in various forms for a century, but over the past two decades, they have spread through the American justice system, driven by advances in social science. The tools try to predict recidivism — repeat offending or breaking the rules of probation or parole — using statistical probabilities based on factors such as age, employment history and prior criminal record. They are now used at some stage of the criminal justice process in nearly every state. Many court systems use the tools to guide decisions about which prisoners to release on parole, for example, and risk assessments are becoming increasingly popular as a way to help set bail for inmates awaiting trial.

But Pennsylvania is about to take a step most states have until now resisted for adult defendants: using risk assessment in sentencing itself. A state commission is putting the finishing touches on a plan that, if implemented as expected, could allow some offenders considered low risk to get shorter prison sentences than they would otherwise or avoid incarceration entirely. Those deemed high risk could spend more time behind bars.

Pennsylvania, which already uses risk assessment in other phases of its criminal justice system, is considering the approach in sentencing because it is struggling with an unwieldy and expensive corrections system. Pennsylvania has roughly 50,000 people in state custody, 2,000 more than it has permanent beds for. Thousands more are in local jails, and hundreds of thousands are on probation or parole. The state spends $2 billion a year on its corrections system — more than 7 percent of the total state budget, up from less than 2 percent 30 years ago. Yet recidivism rates remain high: 1 in 3inmates is arrested again or reincarcerated within a year of being released.

States across the country are facing similar problems — Pennsylvania’s incarceration rate is almost exactly the national average — and many policymakers see risk assessment as an attractive solution. Moreover, the approach has bipartisan appeal: Among some conservatives, risk assessment appeals to the desire to spend tax dollars on locking up only those criminals who are truly dangerous to society. And some liberals hope a data-driven justice system will be less punitive overall and correct for the personal, often subconscious biases of police, judges and probation officers. In theory, using risk assessment tools could lead to both less incarceration and less crime.

There are more than 60 risk assessment tools in use across the U.S., and they vary widely. But in their simplest form, they are questionnaires — typically filled out by a jail staff member, probation officer or psychologist — that assign points to offenders based on anything from demographic factors to family background to criminal history. The resulting scores are based on statistical probabilities derived from previous offenders’ behavior. A low score designates an offender as “low risk” and could result in lower bail, less prison time or less restrictive probation or parole terms; a high score can lead to tougher sentences or tighter monitoring.

The risk assessment trend is controversial. Critics have raised numerous questions: Is it fair to make decisions in an individual case based on what similar offenders have done in the past? Is it acceptable to use characteristics that might be associated with race or socioeconomic status, such as the criminal record of a person’s parents? And even if states can resolve such philosophical questions, there are also practical ones: What to do about unreliable data? Which of the many available tools — some of them licensed by for-profit companies — should policymakers choose?…(More)”