Big data for government good: using analytics for policymaking


Kent Smetters in The Hill: ” Big Data and analytics are driving advancements that touch nearly every part of our lives. From improving disaster relief efforts following a storm, to enhancing patient response to specific medications to criminal justice reform and real-time traffic reporting, Big Data is saving lives, reducing costs and improving productivity across the private and the public sector.Yet when our elected officials draft policy they lack access to advanced data and analytics that would help them understand the economic implications of proposed legislation. Instead of using Big Data to inform and shape vital policy questions, Members of Congress typically don’t receive a detailed analysis of a bill until after it has been written, and after they have sought support for it. That’s when a policy typically undergoes a detailed budgetary analysis. And even then, these assessments often ignore the broader impact on jobs and the economy.

Yet when our elected officials draft policy they lack access to advanced data and analytics that would help them understand the economic implications of proposed legislation. Instead of using Big Data to inform and shape vital policy questions, Members of Congress typically don’t receive a detailed analysis of a bill until after it has been written, and after they have sought support for it. That’s when a policy typically undergoes a detailed budgetary analysis. And even then, these assessments often ignore the broader impact on jobs and the economy.

We must do better. Just as modern marketing firms use deep analytical tools to make smart business decisions, policymakers in Washington should similarly have access to modern tools for analyzing important policy questions.
Will Social Security be solvent for our grandchildren? How will changes to immigration policy influence the number of jobs and the GDP? How will tax reform impact the budget, economic growth and the income distribution? What is the impact of new investments in health care, education and roads? These are big questions that must be answered with reliable data and analysis while legislation is being written, not afterwards. The absence leaves us with ideology-driven partisanship.

Simply put, Washington needs better tools to evaluate these complex factors. Imagine the productive conversations we could have if we applied the kinds of tools that are commonplace in the business world to help Washington make more informed choices.

For example, with the help of a nonpartisan budget model from the Wharton School of the University of Pennsylvania, policymakers and the public can uncover some valuable—and even surprising—information about our choices surrounding Social Security, immigration and other issues.

By analyzing more than 4,000 different Social Security policy options, for example, the model projects that the Social Security Trust Fund will be depleted three years earlier than the Social Security Administration’s projections, barring any changes in current law. The tool’s projected shortfalls are larger than the SSA’s, in fact—because it takes into account how changes over time will affect the outcome. We also learn that many standard policy options fail to significantly move the Trust Fund exhaustion date, as these policies phase in too slowly or are too small. Securing Social Security, we now know, requires a range of policy combinations and potentially larger changes than we may have been considering.

Immigration policy, too, is an area where we could all benefit from greater understanding. The political left argues that legalizing undocumented workers will have a positive impact on jobs and the economy. The political right argues for just the opposite—deportation of undocumented workers—for many of the same reasons. But, it turns out, the numbers don’t offer much support to either side.

On one hand, legalization actually slightly reduces the number of jobs. The reason is simple: legal immigrants have better access to school and college, and they can spend more time looking for the best job match. However, because legal immigrants can gain more skills, the actual impact on GDP from legalization alone is basically a wash.

The other option being discussed, deportation, also reduces jobs, in this case because the number of native-born workers can’t rise enough to absorb the job losses caused by deportation. GDP also declines. Calculations based on 125 different immigration policy combinations show that increasing the total amount of legal immigrants—especially those with higher skills—is the most effective policy for increasing employment rates and GDP….(More)”

Data-Driven Justice Initiative, Disrupting Cycle of Incarceration


The White House: “Every year, more than 11 million people move through America’s 3,100 local jails, many on low-level, non-violent misdemeanors, costing local governments approximately $22 billion a year. In local jails, 64 percent of people suffer from mental illness, 68 percent have a substance abuse disorder, and 44 percent suffer from chronic health problems. Communities across the country have recognized that a relatively small number of these highly vulnerable people cycle repeatedly not just through local jails, but also hospital emergency rooms, shelters, and other public systems, receiving fragmented and uncoordinated care at great cost to American taxpayers, with poor outcomes.

For example, in Miami-Dade, Florida found that 97 people with serious mental illness accounted for $13.7 million in services over four years, spending more than 39,000 days in either jail, emergency rooms, state hospitals or psychiatric facilities in their county. In response, the county provided key mental health de-escalation training to their police officers and 911 dispatchers and, over the past five years, Miami-Dade police have responded to nearly 50,000 calls for service for people in mental health crisis, but have made only 109 arrests, diverting more than 10,000 people to services or safely stabilizing situations without arrest. The jail population fell from over 7000 to just over 4700 and the county was able to close an entire jail facility, saving nearly $12 million a year.

In addition, on any given day, more than 450,000 people are held in jail before trial, nearly 63 percent of the local jail population, even though they have not been convicted of a crime. A 2014 study of New York’s Riker’s Island jail found more than 86% percent of detained individuals were held on a bond of $500 or less. To tackle the challenges of bail, in 2014 Charlotte-Mecklenburg, NC began using a data-based risk assessment tool to identify low risk people in jail and find ways to release them safely. Since they began using the tool, the jail population has gone down 20 percent, significantly more low-risk individuals have been released from jail, and there has been no increase in reported crime.

To break this cycle of incarceration, the Administration has launched the Data-Driven Justice Initiative with a bipartisan coalition of city, county, and state governments who have committed to using data-driven strategies to divert low-level offenders with mental illness out of the criminal system and to change approaches to pre-trial incarceration so that low risk offenders no longer stay in jail simply because they cannot afford a bond. These innovative strategies, which have measurably reduced jail populations in several communities, help stabilize individuals and families, better serve communities, and, often, saves money in the process. DDJ communities commit to:

  1. combining data from across criminal justice and health systems to identify the individuals with the highest number of contacts with police, ambulance, emergency departments, and other services, and, leverage existing resources to link them to health, behavioral health, and social services in the community;
  2. equipping law enforcement and first responders to enable more rapid deployment of tools, approaches, and other innovations they need to safely and more effectively respond to people in mental health crisis and divert people with high needs to identified service providers instead of arrest; and
  3. working towards using objective, data-driven, validated risk assessment tools to inform the safe release of low-risk defendants from jails in order to reduce the jail population held pretrial….(More: FactSheet)”

The Billions We’re Wasting in Our Jails


Stephen Goldsmith  and Jane Wiseman in Governing: “By using data analytics to make decisions about pretrial detention, local governments could find substantial savings while making their communities safer….

Few areas of local government spending present better opportunities for dramatic savings than those that surround pretrial detention. Cities and counties are wasting more than $3 billion a year, and often inducing crime and job loss, by holding the wrong people while they await trial. The problem: Only 10 percent of jurisdictions use risk data analytics when deciding which defendants should be detained.

As a result, dangerous people are out in our communities, while many who could be safely in the community are behind bars. Vast numbers of people accused of petty offenses spend their pretrial detention time jailed alongside hardened convicts, learning from them how to be better criminals….

In this era of big data, analytics not only can predict and prevent crime but also can discern who should be diverted from jail to treatment for underlying mental health or substance abuse issues. Avoided costs aggregating in the billions could be better spent on detaining high-risk individuals, more mental health and substance abuse treatment, more police officers and other public safety services.

Jurisdictions that do use data to make pretrial decisions have achieved not only lower costs but also greater fairness and lower crime rates. Washington, D.C., releases 85 percent of defendants awaiting trial. Compared to the national average, those released in D.C. are two and a half times more likely to remain arrest-free and one and a half times as likely to show up for court.

Louisville, Ky., implemented risk-based decision-making using a tool developed by the Laura and John Arnold Foundation and now releases 70 percent of defendants before trial. Those released have turned out to be twice as likely to return to court and to stay arrest-free as those in other jurisdictions. Mesa County, Colo., and Allegheny County, Pa., both have achieved significant savings from reduced jail populations due to data-driven release of low-risk defendants.

Data-driven approaches are beginning to produce benefits not only in the area of pretrial detention but throughout the criminal justice process. Dashboards now in use in a handful of jurisdictions allow not only administrators but also the public to see court waiting times by offender type and to identify and address processing bottlenecks….(More)”

White House Challenges Artificial Intelligence Experts to Reduce Incarceration Rates


Jason Shueh at GovTech: “The U.S. spends $270 billion on incarceration each year, has a prison population of about 2.2 million and an incarceration rate that’s spiked 220 percent since the 1980s. But with the advent of data science, White House officials are asking experts for help.

On Tuesday, June 7, the White House Office of Science and Technology Policy’s Lynn Overmann, who also leads the White House Police Data Initiative, stressed the severity of the nation’s incarceration crisis while asking a crowd of data scientists and artificial intelligence specialists for aid.

“We have built a system that is too large, and too unfair and too costly — in every sense of the word — and we need to start to change it,” Overmann said, speaking at a Computing Community Consortium public workshop.

She argued that the U.S., a country that has the highest amount incarcerated citizens in the world, is in need of systematic reforms with both data tools to process alleged offenders and at the policy level to ensure fair and measured sentences. As a longtime counselor, advisor and analyst for the Justice Department and at the city and state levels, Overman said she has studied and witnessed an alarming number of issues in terms of bias and unwarranted punishments.

For instance, she said that statistically, while drug use is about equal between African-Americans and Caucasians, African-Americans are more likely to be arrested and convicted. They also receive longer prison sentences compared to Caucasian inmates convicted of the same crimes….

Data and digital tools can help curb such pitfalls by increasing efficiency, transparency and accountability, she said.

“We think these types of data exchanges [between officials and technologists] can actually be hugely impactful if we can figure out how to take this information and operationalize it for the folks who run these systems,” Obermann noted.

The opportunities to apply artificial intelligence and data analytics, she said, might include using it to improve questions on parole screenings, using it to analyze police body camera footage, and applying it to criminal justice data for legislators and policy workers….

If the private sector is any indication, artificial intelligence and machine learning techniques could be used to interpret this new and vast supply of law enforcement data. In an earlier presentation by Eric Horvitz, the managing director at Microsoft Research, Horvitz showcased how the company has applied artificial intelligence to vision and language to interpret live video content for the blind. The app, titled SeeingAI, can translate live video footage, captured from an iPhone or a pair of smart glasses, into instant audio messages for the seeing impaired. Twitter’s live-streaming app Periscope has employed similar technology to guide users to the right content….(More)”

What Algorithmic Injustice Looks Like in Real Life


Julia Angwin, Jeff Larson, Surya Mattu & Lauren Kirchner at Pacific Standard: “Courtrooms across the nation are using computer programs to predict who will be a future criminal. The programs help inform decisions on everything from bail to sentencing. They are meant to make the criminal justice system fairer — and to weed out human biases.

ProPublica tested one such program and found that it’s often wrong — and biased against blacks.

We looked at the risk scores the program spit out for more than 7,000 people arrested in Broward County, Florida, in 2013 and 2014. We checked to see how many defendants were charged with new crimes over the next two years — the same benchmark used by the creators of the algorithm. Our analysis showed:

  • The formula was particularly likely to falsely flag black defendants as future criminals, wrongly labeling them this way at almost twice the rate as white defendants.
  • White defendants were mislabeled as low risk more often than black defendants.

What does that look like in real life? Here are five comparisons of defendants — one black and one white — who were charged with similar offenses but got very different scores.

Two Shoplifting Arrests

James Rivelli, 53: In August 2014, Rivelli allegedly shoplifted seven boxes of Crest Whitestrips from a CVS. An employee called the police. When the cops found Rivelli and pulled him over, they found the Whitestrips as well as heroin and drug paraphernalia in his car. He was charged with two felony counts and four misdemeanors for grand theft, drug possession, and driving with a suspended license and expired tags.

Past offenses: He had been charged with felony aggravated assault for domestic violence in 1996, felony grand theft also in 1996, and a misdemeanor theft in 1998. He also says that he was incarcerated in Massachusetts for felony drug trafficking.

COMPAS score: 3 — low

Subsequent offense: In April 2015, he was charged with two felony counts of grand theft in the 3rd degree for shoplifting about $1,000 worth of tools from a Home Depot.

He says: Rivelli says his crimes were fueled by drug use and he is now sober. “I’m surprised [my risk score] is so low,” Rivelli said in an interview in his mother’s apartment in April. “I spent five years in state prison in Massachusetts.”…(More)

Moneyballing Criminal Justice


Anne Milgram in the Atlantic: “…One area in which the potential of data analysis is still not adequately realized,however, is criminal justice. This is somewhat surprising given the success of CompStat, a law enforcement management tool that uses data to figure out how police resources can be used to reduce crime and hold law enforcement officials accountable for results. CompStat is widely credited with contributing to New York City’s dramatic reduction in serious crime over the past two decades. Yet data-driven decision-making has not expanded to the whole of the criminal justice system.

But it could. And, in this respect, the front end of the system — the part of the process that runs from arrest through sentencing — is particularly important. Atthis stage, police, prosecutors, defenders, and courts make key choices about how to deal with offenders — choices that, taken together, have an enormous impact on crime. Yet most jurisdictions do not collect or analyze the data necessary to know whether these decisions are being made in a way that accomplishes the most important goals of the criminal justice system: increased public safety,decreased recidivism, reduced cost, and the fair, efficient administration of justice.

Even in jurisdictions where good data exists, a lack of technology is often an obstacle to using it effectively. Police, jails, courts, district attorneys, and public defenders each keep separate information systems, the data from which is almost never pulled together and analyzed in a way that could answer the questions that matter most: Who is in our criminal justice system? What crimes have been charged? What risks do individual offenders pose? And which option would best protect the public and make the best use of our limited resources?

While debates about prison over-crowding, three strikes laws, and mandatory minimum sentences have captured public attention, the importance of what happens between arrest and sentencing has gone largely unnoticed. Even though I ran the criminal justice system in New Jersey, one of the largest states in the country, I had not realized the magnitude of the pretrial issues until I was tasked by theLaura and John Arnold Foundation with figuring out which aspects of criminal justice had the most need and presented the greatest opportunity for reform….

Technology could help us leverage data to identify offenders who will pose unacceptable risks to society if they are not behind bars and distinguish them from those defendants who will have lower recidivism rates if they are supervised in the community or given alternatives to incarceration before trial. Likewise, it could help us figure out which terms of imprisonment, alternatives to incarceration, and other interventions work best–and for whom. And the list does not end there.

The truth is our criminal justice system already makes these decisions every day.But it makes them without knowing whether they’re the right ones. That needs to change. If data is powerful enough to transform baseball, health care, and education, it can do the same for criminal justice….(More)”

…(More).

Big Risks, Big Opportunities: the Intersection of Big Data and Civil Rights


Latest White House report on Big Data charts pathways for fairness and opportunity but also cautions against re-encoding bias and discrimination into algorithmic systems: ” Advertisements tailored to reflect previous purchasing decisions; targeted job postings based on your degree and social networks; reams of data informing predictions around college admissions and financial aid. Need a loan? There’s an app for that.

As technology advances and our economic, social, and civic lives become increasingly digital, we are faced with ethical questions of great consequence. Big data and associated technologies create enormous new opportunities to revisit assumptions and instead make data-driven decisions. Properly harnessed, big data can be a tool for overcoming longstanding bias and rooting out discrimination.

The era of big data is also full of risk. The algorithmic systems that turn data into information are not infallible—they rely on the imperfect inputs, logic, probability, and people who design them. Predictors of success can become barriers to entry; careful marketing can be rooted in stereotype. Without deliberate care, these innovations can easily hardwire discrimination, reinforce bias, and mask opportunity.

Because technological innovation presents both great opportunity and great risk, the White House has released several reports on “big data” intended to prompt conversation and advance these important issues. The topics of previous reports on data analytics included privacy, prices in the marketplace, and consumer protection laws. Today, we are announcing the latest report on big data, one centered on algorithmic systems, opportunity, and civil rights.

The first big data report warned of “the potential of encoding discrimination in automated decisions”—that is, discrimination may “be the inadvertent outcome of the way big data technologies are structured and used.” A commitment to understanding these risks and harnessing technology for good prompted us to specifically examine the intersection between big data and civil rights.

Using case studies on credit lending, employment, higher education, and criminal justice, the report we are releasing today illustrates how big data techniques can be used to detect bias and prevent discrimination. It also demonstrates the risks involved, particularly how technologies can deliberately or inadvertently perpetuate, exacerbate, or mask discrimination.

The purpose of the report is not to offer remedies to the issues it raises, but rather to identify these issues and prompt conversation, research—and action—among technologists, academics, policy makers, and citizens, alike.

The report includes a number of recommendations for advancing work in this nascent field of data and ethics. These include investing in research, broadening and diversifying technical leadership, cross-training, and expanded literacy on data discrimination, bolstering accountability, and creating standards for use within both the government and the private sector. It also calls on computer and data science programs and professionals to promote fairness and opportunity as part of an overall commitment to the responsible and ethical use of data.

Big data is here to stay; the question is how it will be used: to advance civil rights and opportunity, or to undermine them….(More)”

Your Data Footprint Is Affecting Your Life In Ways You Can’t Even Imagine


Jessica Leber at Fast Co-Exist: “Cities have long seen the potential in big data to improve the government and the lives of citizens, and this is now being put into action in ways where governments touch citizens’ lives in very sensitive areas. New York City’s Department of Homelessness Services is mining apartment eviction filings, to see if they can understand who is at risk of becoming homeless and intervene early. And police departments all over the country have adopted predictive policing software that guides where officers should deploy, and at what time, leading to reduced crime in some cities.

In one study in Los Angeles, police officers deployed to certain neighborhoods by predictive policing software prevented 4.3 crimes per week, compared to 2 crimes per week when assigned to patrol a specific area by human crime analysts. Surely, a reduction in crime is a good thing. But community activists in places such as Bellingham, Washington, have grave doubts. They worry that outsiders can’t examine how the algorithms work, since the software is usually proprietary, and so citizens have no way of knowing what data the government is using to target them. They also worry that predictive policing is just exacerbating existing patterns of racial profiling. If the underlying crime data being used is the result of years of over-policing minority communities for minor offenses, then the predictions based on this biased data could create a feedback loop and lead to yet more over-policing.

At a smaller and more limited scale is the even more sensitive area of child protection services. Though the data isn’t really as “big” as in other examples, a few agencies are carefully exploring using statistical models to make decisions in several areas, such as which children in the system are most in danger of violence, which children are most in need of a trauma screening, and which are at risk of entering the criminal justice system. 

In Hillsborough County, Florida, where a series of child homicides occurred, a private provider selected to manage the county’s child welfare system in 2012 came in and analyzed the data. Cases with the highest probability of serious injury or death had a few factors in common, they found: a child under the age of three, a “paramour” in the home, a substance abuse or domestic violence history, and a parent previously in the foster care system. They identified nine practices to use in these cases and hired a software provider to create a dashboard that allowed real-time feedback and dashboards. Their success has led to the program being implemented statewide….

“I think the opportunity is a rich one. At the same time, the ethical considerations need to be guiding us,” says Jesse Russell, chief program officer at the National Council on Crime and Delinquency, who has followed the use of predictive analytics in child protective services. Officials, he says, are treading carefully before using data to make decisions about individuals, especially when the consequences of being wrong—such as taking a child out of his or her home unnecessarily—are huge. And while caseworker decision-making can be flawed or biased, so can the programs that humans design. When you rely too much on data—if the data is flawed or incomplete, as could be the case in predictive policing—you risk further validating bad decisions or existing biases….

On the other hand, big data does have the potential to vastly expand our understanding of who we are and why we do what we do. A decade ago, serious scientists would have laughed someone out of the room who proposed a study of “the human condition.” It is a topic so broad and lacking in measurability. But perhaps the most important manifestation of big data in people’s lives could come from the ability for scientists to study huge, unwieldy questions they couldn’t before.

A massive scientific undertaking to study the human condition is set to launch in January of 2017. The Kavli Human Project, funded by the Kavli Foundation, plans to recruit 10,000 New Yorkers from all walks of life to be measured for 10 years. And by measured, they mean everything: all financial transactions, tax returns, GPS coordinates, genomes, chemical exposure, IQ, bluetooth sensors around the house, who subjects text and call—and that’s just the beginning. In all, the large team of academics expect to collect about a billion data points per person per year at an unprecedented low cost for each data point compared to other large research surveys.

The hope is with so much continuous data, researchers can for the first time start to disentangle the complex, seemingly unanswerable questions that have plagued our society, from what is causing the obesity epidemic to how to disrupt the poverty to prison cycle….(More)

Do Universities, Research Institutions Hold the Key to Open Data’s Next Chapter


Ben Miller at Government Technology: “Government produces a lot of data — reams of it, roomfuls of it, rivers of it. It comes in from citizen-submitted forms, fleet vehicles, roadway sensors and traffic lights. It comes from utilities, body cameras and smartphones. It fills up servers and spills into the cloud. It’s everywhere.

And often, all that data sits there not doing much. A governing entity might have robust data collection and it might have an open data policy, but that doesn’t mean it has the computing power, expertise or human capital to turn those efforts into value.

The amount of data available to government and the computing public promises to continue to multiply — the growing smart cities trend, for example, installs networks of sensors on everything from utility poles to garbage bins.

As all this happens, a movement — a new spin on an old concept — has begun to take root: partnerships between government and research institutes. Usually housed within universities and laboratories, these partnerships aim to match strength with strength. Where government has raw data, professors and researchers have expertise and analytics programs.

Several leaders in such partnerships, spanning some of the most tech-savvy cities in the country, see increasing momentum toward the concept. For instance, the John D. and Catherine T. MacArthur Foundation in September helped launch the MetroLab Network, an organization of more than 20 cities that have partnered with local universities and research institutes for smart-city-oriented projects….

Two recurring themes in projects that universities and research organizations take on in cooperation with government are project evaluation and impact analysis. That’s at least partially driven by the very nature of the open data movement: One reason to open data is to get a better idea of how well the government is operating….

Open data may have been part of the impetus for city-university partnerships, in that the availability of more data lured researchers wanting to work with it and extract value. But those partnerships have, in turn, led to government officials opening more data than ever before for useful applications.

Sort of.

“I think what you’re seeing is not just open data, but kind of shades of open — the desire to make the data open to university researchers, but not necessarily the broader public,” said Beth Noveck, co-founder of New York University’s GovLab.


shipping+crates

GOVLAB: DOCKER FOR DATA 

Much of what GovLab does is about opening up access to data, and that is the whole point of Docker for Data. The project aims to simplify and quicken the process of extracting and loading large data sets so they will respond to Structured Query Language commands by moving the computing power of that process to the cloud. The docker can be installed with a single line of code, and its website plays host to already-extracted data sets. Since its inception, the website has grown to include more than 100 gigabytes of data from more than 8,000 data sets. From Baltimore, for example, one can easily find information on public health, water sampling, arrests, senior centers and more. Photo via Shutterstock.


That’s partially because researchers are a controlled group who can be forced to sign memorandums of understanding and trained to protect privacy and prevent security breaches when government hands over sensitive data. That’s a top concern of agencies that manage data, and it shows in the GovLab’s work.

It was something Noveck found to be very clear when she started working on a project she simply calls “Arnold” because of project support from the Laura and John Arnold Foundation. The project involves building a better understanding of how different criminal justice jurisdictions collect, store and share data. The motivation is to help bridge the gaps between people who manage the data and people who should have easy access to it. When Noveck’s center conducted a survey among criminal justice record-keepers, the researchers found big differences between participants.

“There’s an incredible disparity of practices that range from some jurisdictions that have a very well established, formalized [memorandum of understanding] process for getting access to data, to just — you send an email to a guy and you hope that he responds, and there’s no organized way to gain access to data, not just between [researchers] and government entities, but between government entities,” she said….(More)

UK police force trials virtual crime visits over Skype


Nick Summers at Engadget: In an effort to cut costs and make its officers more efficient, police in Peterborough, England are asking citizens to report their crimes over Skype. So, whereas before a local “bobby” would come round to their house, notepad in hand, to ask questions and take down what happened, the entire process will now be conducted over webcam. Alternatively, victims can do the follow-up on the phone or at the station — handy if Skype is being its usual, unreliable self. The system is being trialled for crimes reported via 101, the police’s non-emergency contact number. The force says it’ll give people more flexibility with appointment times, and also ensure officers spend more hours each day on patrol. We suspect it also has something to do with the major budget cuts facing forces up and down the country….(More)”