What statistics can and can’t tell us about ourselves


Hannah Fry at The New Yorker: “Harold Eddleston, a seventy-seven-year-old from Greater Manchester, was still reeling from a cancer diagnosis he had been given that week when, on a Saturday morning in February, 1998, he received the worst possible news. He would have to face the future alone: his beloved wife had died unexpectedly, from a heart attack.

Eddleston’s daughter, concerned for his health, called their family doctor, a well-respected local man named Harold Shipman. He came to the house, sat with her father, held his hand, and spoke to him tenderly. Pushed for a prognosis as he left, Shipman replied portentously, “I wouldn’t buy him any Easter eggs.” By Wednesday, Eddleston was dead; Dr. Shipman had murdered him.

Harold Shipman was one of the most prolific serial killers in history. In a twenty-three-year career as a mild-mannered and well-liked family doctor, he injected at least two hundred and fifteen of his patients with lethal doses of opiates. He was finally arrested in September, 1998, six months after Eddleston’s death.

David Spiegelhalter, the author of an important and comprehensive new book, “The Art of Statistics” (Basic), was one of the statisticians tasked by the ensuing public inquiry to establish whether the mortality rate of Shipman’s patients should have aroused suspicion earlier. Then a biostatistician at Cambridge, Spiegelhalter found that Shipman’s excess mortality—the number of his older patients who had died in the course of his career over the number that would be expected of an average doctor’s—was a hundred and seventy-four women and forty-nine men at the time of his arrest. The total closely matched the number of victims confirmed by the inquiry….

In 1825, the French Ministry of Justice ordered the creation of a national collection of crime records. It seems to have been the first of its kind anywhere in the world—the statistics of every arrest and conviction in the country, broken down by region, assembled and ready for analysis. It’s the kind of data set we take for granted now, but at the time it was extraordinarily novel. This was an early instance of Big Data—the first time that mathematical analysis had been applied in earnest to the messy and unpredictable realm of human behavior.

Or maybe not so unpredictable. In the early eighteen-thirties, a Belgian astronomer and mathematician named Adolphe Quetelet analyzed the numbers and discovered a remarkable pattern. The crime records were startlingly consistent. Year after year, irrespective of the actions of courts and prisons, the number of murders, rapes, and robberies reached almost exactly the same total. There is a “terrifying exactitude with which crimes reproduce themselves,” Quetelet said. “We know in advance how many individuals will dirty their hands with the blood of others. How many will be forgers, how many poisoners.”

To Quetelet, the evidence suggested that there was something deeper to discover. He developed the idea of a “Social Physics,” and began to explore the possibility that human lives, like planets, had an underlying mechanistic trajectory. There’s something unsettling in the idea that, amid the vagaries of choice, chance, and circumstance, mathematics can tell us something about what it is to be human. Yet Quetelet’s overarching findings still stand: at some level, human life can be quantified and predicted. We can now forecast, with remarkable accuracy, the number of women in Germany who will choose to have a baby each year, the number of car accidents in Canada, the number of plane crashes across the Southern Hemisphere, even the number of people who will visit a New York City emergency room on a Friday evening….(More)”

Study finds Big Data eliminates confidentiality in court judgements


Swissinfo: “Swiss researchers have found that algorithms that mine large swaths of data can eliminate anonymity in federal court rulings. This could have major ramifications for transparency and privacy protection.

This is the result of a study by the University of Zurich’s Institute of Law, published in the legal journal “Jusletter” and shared by Swiss public television SRF on Monday.

The study relied on a “web scraping technique” or mining of large swaths of data. The researchers created a database of all decisions of the Supreme Court available online from 2000 to 2018 – a total of 122,218 decisions. Additional decisions from the Federal Administrative Court and the Federal Office of Public Health were also added.

Using an algorithm and manual searches for connections between data, the researchers were able to de-anonymise, in other words reveal identities, in 84% of the judgments in less than an hour.

In this specific study, the researchers were able to identify the pharma companies and medicines hidden in the documents of the complaints filed in court.  

Study authors say that this could have far-reaching consequences for transparency and privacy. One of the study’s co-authors Kerstin Noëlle Vokinger, professor of law at the University of Zurich explains that, “With today’s technological possibilities, anonymisation is no longer guaranteed in certain areas”. The researchers say the technique could be applied to any publicly available database.

Vokinger added there is a need to balance necessary transparency while safeguarding the personal rights of individuals.

Adrian Lobsiger, the Swiss Federal Data Protection Commissioner, told SRF that this confirms his view that facts may need to be treated as personal data in the age of technology….(More)”.

Investigators Use New Strategy to Combat Opioid Crisis: Data Analytics


Byron Tau and Aruna Viswanatha in the Wall Street Journal: “When federal investigators got a tip in 2015 that a health center in Houston was distributing millions of doses of opioid painkillers, they tried a new approach: look at the numbers.

State and federal prescription and medical billing data showed a pattern of overprescription, giving authorities enough ammunition to send an undercover Drug Enforcement Administration agent. She found a crowded waiting room and armed security guards. After a 91-second appointment with the sole doctor, the agent paid $270 at the cash-only clinic and walked out with 100 10mg pills of the powerful opioid hydrocodone.

The subsequent prosecution of the doctor and the clinic owner, who were sentenced last year to 35 years in prison, laid the groundwork for a new data-driven Justice Department strategy to help target one of the worst public-health crises in the country. Prosecutors expanded the pilot program from Houston to the hard-hit Appalachian region in early 2019. Within months, the effort resulted in the indictments of dozens of doctors, nurses, pharmacists and others. Two-thirds of them had been identified through analyzing the data, a Justice Department official said. A quarter of defendants were expected to plead guilty, according to the Justice Department, and additional indictments through the program are expected in the coming weeks.

“These are doctors behaving like drug dealers,” said Brian Benczkowski, head of the Justice Department’s criminal division who oversaw the expansion.

“They’ve been operating as though nobody could see them for a long period of time. Now we have the data,” Mr. Benczkowski said.

The Justice Department’s fraud section has been using data analytics in health-care prosecutions for several years—combing through Medicare and Medicaid billing data for evidence of fraud, and deploying the strategy in cities around the country that saw outlier billings. In 2018, the health-care fraud unit charged more than 300 people with fraud totaling more than $2 billion, according to the Justice Department.

But using the data to combat the opioid crisis, which is ravaging communities across the country, is a new development for the department, which has made tackling the epidemic a key priority in the Trump administration….(More)”.

E-Nudging Justice: The Role of Digital Choice Architecture in Online Courts


Paper by Ayelet Sela: “Justice systems around the world are launching online courts and tribunals in order to improve access to justice, especially for self-represented litigants (SRLs). Online courts are designed to handhold SRLs throughout the process and empower them to make procedural and substantive decisions. To that end, they present SRLs with streamlined and simplified procedures and employ a host of user interface design and user experience strategies (UI/UX). Focusing on these features, the article analyzes online courts as digital choice environments that shape SRLs’ decisions, inputs and actions, and considers their implications on access to justice, due process and the impartiality of courts. Accordingly, the article begins to close the knowledge gap regarding choice architecture in online legal proceedings. 

Using examples from current online courts, the article considers how mechanisms such as choice overload, display, colorfulness, visual complexity, and personalization influence SRLs’ choices and actions. The analysis builds on research in cognitive psychology and behavioral economics that shows that subtle changes in the context in which decisions are made steer (nudge) people to choose a particular option or course of action. It is also informed by recent studies that capture the effect of digital choice architecture on users’ choices and behaviors in online settings. The discussion clarifies that seemingly naïve UI/UX features can strongly influence users of online courts, in a manner that may be at odds with their institutional commitment to impartiality and due process. Moreover, the article challenges the view that online court interfaces (and those of other online legal services, for that matter) should be designed to maximize navigability, intuitiveness and user-friendliness. It argues that these design attributes involve the risk of nudging SRLs to make uninformed, non-deliberate, and biased decisions, possibly infringing their autonomy and self-determination. Accordingly, the article suggests that choice architecture in online courts should aim to encourage reflective participation and informed decision-making. Specifically, its goal should be to improve SRLs’ ability to identify and consider options, and advance their own — inherently diverse — interests. In order to mitigate the abovementioned risks, the article proposes an initial evaluation framework, measures, and methodologies to support evidence-based and ethical choice architecture in online courts….(More)”.

Law as Data: Computation, Text, and the Future of Legal Analysis


Book edited by Michael A. Livermore and Daniel N. Rockmore: “In recent years, the digitization of legal texts, combined with developments in the fields of statistics, computer science, and data analytics, have opened entirely new approaches to the study of law. This volume explores the new field of computational legal analysis, an approach marked by its use of legal texts as data. The emphasis herein is work that pushes methodological boundaries, either by using new tools to study longstanding questions within legal studies or by identifying new questions in response to developments in data availability and analysis.

By using the text and underlying data of legal documents as the direct objects of quantitative statistical analysis, Law as Data introduces the legal world to the broad range of computational tools already proving themselves relevant to law scholarship and practice, and highlights the early steps in what promises to be an exciting new approach to studying the law….(More)”.

Review into bias in algorithmic decision-making


Interim Report by the Centre for Data Ethics and Innovation (UK): The use of algorithms has the potential to improve the quality of decision- making by increasing the speed and accuracy with which decisions are made. If designed well, they can reduce human bias in decision-making processes. However, as the volume and variety of data used to inform decisions increases, and the algorithms used to interpret the data become more complex, concerns are growing that without proper oversight, algorithms risk entrenching and potentially worsening bias.

The way in which decisions are made, the potential biases which they are subject to and the impact these decisions have on individuals are highly context dependent. Our Review focuses on exploring bias in four key sectors: policing, financial services, recruitment and local government. These have been selected because they all involve significant decisions being made about individuals, there is evidence of the growing uptake of machine learning algorithms in the sectors and there is evidence of historic bias in decision-making within these sectors. This Review seeks to answer three sets of questions:

  1. Data: Do organisations and regulators have access to the data they require to adequately identify and mitigate bias?
  2. Tools and techniques: What statistical and technical solutions are available now or will be required in future to identify and mitigate bias and which represent best practice?
  3. Governance: Who should be responsible for governing, auditing and assuring these algorithmic decision-making systems?

Our work to date has led to some emerging insights that respond to these three sets of questions and will guide our subsequent work….(More)”.

Studying Crime and Place with the Crime Open Database


M. P. J. Ashby in Research Data Journal for the Humanities and Social Sciences: “The study of spatial and temporal crime patterns is important for both academic understanding of crime-generating processes and for policies aimed at reducing crime. However, studying crime and place is often made more difficult by restrictions on access to appropriate crime data. This means understanding of many spatio-temporal crime patterns are limited to data from a single geographic setting, and there are few attempts at replication. This article introduces the Crime Open Database (code), a database of 16 million offenses from 10 of the largest United States cities over 11 years and more than 60 offense types. Open crime data were obtained from each city, having been published in multiple incompatible formats. The data were processed to harmonize geographic co-ordinates, dates and times, offense categories and location types, as well as adding census and other geographic identifiers. The resulting database allows the wider study of spatio-temporal patterns of crime across multiple US cities, allowing greater understanding of variations in the relationships between crime and place across different settings, as well as facilitating replication of research….(More)”.

Blockchain and Public Record Keeping: Of Temples, Prisons, and the (Re)Configuration of Power


Paper by Victoria L. Lemieux: “This paper discusses blockchain technology as a public record keeping system, linking record keeping to power of authority, veneration (temples), and control (prisons) that configure and reconfigure social, economic, and political relations. It discusses blockchain technology as being constructed as a mechanism to counter institutions and social actors that currently hold power, but whom are nowadays often viewed with mistrust. It explores claims for blockchain as a record keeping force of resistance to those powers using an archival theoretic analytic lens. The paper evaluates claims that blockchain technology can support the creation and preservation of trustworthy records able to serve as alternative sources of evidence of rights, entitlements and actions with the potential to unseat the institutional power of the nation-state….(More)”.

Secrecy, Privacy and Accountability: Challenges for Social Research


Book by Mike Sheaff: “Public mistrust of those in authority and failings of public organisations frame disputes over attribution of responsibility between individuals and systems. Exemplified with examples, including the Aberfan disaster, the death of Baby P, and Mid Staffs Hospital, this book explores parallel conflicts over access to information and privacy.

The Freedom of Information Act (FOIA) allows access to information about public organisations but can be in conflict with the Data Protection Act, protecting personal information. Exploring the use of the FOIA as a research tool, Sheaff offers a unique contribution to the development of sociological research methods, and debates connected to privacy and secrecy in the information age. This book will provide sociologists and social scientists with a fresh perspective on contemporary issues of power and control….(More)”.

Supreme Court rules against newspaper seeking access to food stamp data


Josh Gerstein at Politico: “The Supreme Court on Monday handed a victory to businesses seeking to block their information from being disclosed to the public after it winds up in the hands of the federal government.

The justices ruled in favor of retailers seeking to prevent a South Dakota newspaper from obtaining store-level data on the redemption of food stamp benefits, now officially known as the Supplemental Nutrition Assistance Program, or SNAP.

The high court ruling rejected a nearly half-century-old appeals court precedent that allowed the withholding of business records under the Freedom of Information Act only in cases where harm would result either to the business or to the government’s ability to acquire information in the future.

The latest case was set into motion when the U.S. Department of Agriculture refused to disclose the store-level SNAP data in response to a 2011 FOIA request from the Argus Leader, the daily newspaper in Sioux Falls, South Dakota. The newspaper sued, but a federal district court ruled in favor of the USDA.

The Argus Leader appealed, and the U.S. Appeals Court for the 8th Circuit ruled that the exemption the USDA was citing did not apply in this case, sending the issue back to a lower court. The district court was tasked with determining whether the USDA was covered by a separate FOIA exemption governing information that would cause competitive injury if released.

That court ruled in favor of the newspaper, at which point the Food Marketing Institute, a trade group that represents retailers such as grocery stores, filed an appeal in lieu of the USDA….(More)”.