Scraping the Web for Public Health Gains: Ethical Considerations from a ‘Big Data’ Research Project on HIV and Incarceration

Stuart Rennie, Mara Buchbinder, Eric Juengst, Lauren Brinkley-Rubinstein, and David L Rosen at Public Health Ethics: “Web scraping involves using computer programs for automated extraction and organization of data from the Web for the purpose of further data analysis and use. It is frequently used by commercial companies, but also has become a valuable tool in epidemiological research and public health planning. In this paper, we explore ethical issues in a project that “scrapes” public websites of U.S. county jails as part of an effort to develop a comprehensive database (including individual-level jail incarcerations, court records and confidential HIV records) to enhance HIV surveillance and improve continuity of care for incarcerated populations. We argue that the well-known framework of Emanuel et al. (2000) provides only partial ethical guidance for the activities we describe, which lie at a complex intersection of public health research and public health practice. We suggest some ethical considerations from the ethics of public health practice to help fill gaps in this relatively unexplored area….(More)”.

Self-interest and data protection drive the adoption and moral acceptability of big data technologies: A conjoint analysis approach

Paper by Rabia I.Kodapanakka, lMark J.Brandt, Christoph Kogler, and Iljavan Beest: “Big data technologies have both benefits and costs which can influence their adoption and moral acceptability. Prior studies look at people’s evaluations in isolation without pitting costs and benefits against each other. We address this limitation with a conjoint experiment (N = 979), using six domains (criminal investigations, crime prevention, citizen scores, healthcare, banking, and employment), where we simultaneously test the relative influence of four factors: the status quo, outcome favorability, data sharing, and data protection on decisions to adopt and perceptions of moral acceptability of the technologies.

We present two key findings. (1) People adopt technologies more often when data is protected and when outcomes are favorable. They place equal or more importance on data protection in all domains except healthcare where outcome favorability has the strongest influence. (2) Data protection is the strongest driver of moral acceptability in all domains except healthcare, where the strongest driver is outcome favorability. Additionally, sharing data lowers preference for all technologies, but has a relatively smaller influence. People do not show a status quo bias in the adoption of technologies. When evaluating moral acceptability, people show a status quo bias but this is driven by the citizen scores domain. Differences across domains arise from differences in magnitude of the effects but the effects are in the same direction. Taken together, these results highlight that people are not always primarily driven by self-interest and do place importance on potential privacy violations. They also challenge the assumption that people generally prefer the status quo….(More)”.

Smarter government or data-driven disaster: the algorithms helping control local communities

Release by MuckRock: “What is the chance you, or your neighbor, will commit a crime? Should the government change a child’s bus route? Add more police to a neighborhood or take some away?

Every day government decisions from bus routes to policing used to be based on limited information and human judgment. Governments now use the ability to collect and analyze hundreds of data points everyday to automate many of their decisions.

Does handing government decisions over to algorithms save time and money? Can algorithms be fairer or less biased than human decision making? Do they make us safer? Automation and artificial intelligence could improve the notorious inefficiencies of government, and it could exacerbate existing errors in the data being used to power it.

MuckRock and the Rutgers Institute for Information Policy & Law (RIIPL) have compiled a collection of algorithms used in communities across the country to automate government decision-making.

Go right to the database.

We have also compiled policies and other guiding documents local governments use to make room for the future use of algorithms. You can find those as a project on DocumentCloud.

View policies on smart cities and technologies

These collections are a living resource and attempt to communally collect records and known instances of automated decision making in government….(More)”.

Predictive Policing Theory

Paper by Andrew Guthrie Ferguson: “Predictive policing is changing law enforcement. New place-based predictive analytic technologies allow police to predict where and when a crime might occur. Data-driven insights have been operationalized into concrete decisions about police priorities and resource allocation. In the last few years, place-based predictive policing has spread quickly across the nation, offering police administrators the ability to identify higher crime locations, to restructure patrol routes, and to develop crime suppression strategies based on the new data.

This chapter suggests that the debate about technology is better thought about as a choice of policing theory. In other words, when purchasing a particular predictive technology, police should be doing more than simply choosing the most sophisticated predictive model; instead they must first make a decision about the type of policing response that makes sense in their community. Foundational questions about whether we want police officers to be agents of social control, civic problem-solvers, or community partners lie at the heart of any choice of which predictive technology might work best for any given jurisdiction.

This chapter then examines predictive policing technology as a choice about policing theory and how the purchase of a particular predictive tool becomes – intentionally or unintentionally – a statement about police role. Interestingly, these strategic choices map on to existing policing theories. Three of the traditional policing philosophies – hot spot policing , problem-oriented policing, and community-based policing have loose parallels with new place-based predictive policing technologies like PredPol, Risk Terrain Modeling (RTM), and HunchLab. This chapter discusses these leading predictive policing technologies as illustrative examples of how police can choose between prioritizing additional police presence, targeting environmental vulnerabilities, and/or establishing a community problem-solving approach as a different means of achieving crime reduction….(More)”.

Machine Learning Technologies and Their Inherent Human Rights Issues in Criminal Justice Contexts

Essay by Jamie Grace: “This essay is an introductory exploration of machine learning technologies and their inherent human rights issues in criminal justice contexts. These inherent human rights issues include privacy concerns, the chilling of freedom of expression, problems around potential for racial discrimination, and the rights of victims of crime to be treated with dignity.

This essay is built around three case studies – with the first on the digital ‘mining’ of rape complainants’ mobile phones for evidence for disclosure to defence counsel. This first case study seeks to show how AI or machine learning tech might hypothetically either ease or inflame some of the tensions involved for human rights in this context. The second case study is concerned with the human rights challenges of facial recognition of suspects by police forces, using automated algorithms (live facial recognition) in public places. The third case study is concerned with the development of useful self-regulation in algorithmic governance practices in UK policing. This essay concludes with an emphasis on the need for the ‘politics of information’ (Lyon, 2007) to catch up with the ‘politics of public protection’ (Nash, 2010)….(More)”.

The Trace

About: “The Trace is an independent, nonpartisan, nonprofit newsroom dedicated to shining a light on America’s gun violence crisis….

Every year in our country, a firearm is used in nearly 500,000 crimes, resulting in the deaths and injuries of more than 110,000 people. Shootings devastate families and communities and drain billions of dollars from local, state, and federal governments. Meanwhile, the problem of gun violence has been compounded by another: the shortage of knowledge about the issue…

Data and records are shielded from public view—or don’t exist. Gun-lobby backed restrictions on federal gun violence research deprive policymakers and public health experts of potentially life-saving facts. Other laws limit the information that law enforcement agencies can share on illegal guns and curb litigation that could allow scrutiny of industry practices….

We make the problem clear. In partnership with Slate, we built an eye-opening, interactive map plotting the locations of nearly 40,000 incidents of gun violence nationwide. The feature received millions of pageviews and generated extensive local coverage and social media conversation. “So many shootings and deaths, so close to my home,” wrote one reader. “And I hadn’t even heard about most of them.”…(More)”.

Using speculative design to explore the future of Open Justice

UK Policy Lab: “Open justice is the principle that ‘justice should not only be done, but should manifestly and undoubtedly be seen to be done’(1). It is a very well established principle within our justice system, however new digital tools and approaches are creating new opportunities and potential challenges which necessitate significant rethinking on how open justice is delivered.

In this context, HM Courts & Tribunal Service (HMCTS) wanted to consider how the principle of open justice should be delivered in the future. As well as seeking input from those who most commonly work with courtrooms, like judges, court staff and legal professionals, they also wanted to explore a range of public views. HMCTS asked us to create a methodology which could spark a wide-ranging conversation about open justice, collecting diverse and divergent perspectives….

We approached this challenge by using speculative design to explore possible and desirable futures with citizens. In this blog we will share what we did (including how you can re-use our materials and approach), what we’ve learned, and what we’ll be experimenting with from here.

What we did

We ran 4 groups of 10 to 12 participants each. We spent the first 30 minutes discussing what participants understood and thought about Open Justice in the present. We spent the next 90 minutes using provocations to immerse them in a range of fictional futures, in which the justice system is accessed through a range of digital platforms.

The provocations were designed to:

  • engage even those with no prior interest, experience or knowledge of Open Justice
  • be reusable
  • not look like ‘finished’ government policy – we wanted to find out more about desirable outcomes
  • as far as possible, provoke discussion without leading
This is an image of one of the provocation cards used in the Open Justice focus groups
Open Justice ‘provocation cards’ used with focus groups

Using provocations to help participants think about the future allowed us to distill common principles which HMCTS can use when designing specific delivery mechanisms.

We hope the conversation can continue. HMCTS have published the provocations on their website. We encourage people to reuse them, or to use them to create their own….(More)”.

The Extended Corporate Mind: When Corporations Use AI to Break the Law

Paper by Mihailis Diamantis: “Algorithms may soon replace employees as the leading cause of corporate harm. For centuries, the law has defined corporate misconduct — anything from civil discrimination to criminal insider trading — in terms of employee misconduct. Today, however, breakthroughs in artificial intelligence and big data allow automated systems to make many corporate decisions, e.g., who gets a loan or what stocks to buy. These technologies introduce valuable efficiencies, but they do not remove (or even always reduce) the incidence of corporate harm. Unless the law adapts, corporations will become increasingly immune to civil and criminal liability as they transfer responsibility from employees to algorithms.

This Article is the first to tackle the full extent of the growing doctrinal gap left by algorithmic corporate misconduct. To hold corporations accountable, the law must sometimes treat them as if they “know” information stored on their servers and “intend” decisions reached by their automated systems. Cognitive science and the philosophy of mind offer a path forward. The “extended mind thesis” complicates traditional views about the physical boundaries of the mind. The thesis states that the mind encompasses any system that sufficiently assists thought, e.g. by facilitating recall or enhancing decision-making. For natural people, the thesis implies that minds can extend beyond the brain to include external cognitive aids, like rolodexes and calculators. This Article adapts the thesis to corporate law. It motivates and proposes a doctrinal framework for extending the corporate mind to the algorithms that are increasingly integral to corporate thought. The law needs such an innovation if it is to hold future corporations to account for their most serious harms….(More)”.

What statistics can and can’t tell us about ourselves

Hannah Fry at The New Yorker: “Harold Eddleston, a seventy-seven-year-old from Greater Manchester, was still reeling from a cancer diagnosis he had been given that week when, on a Saturday morning in February, 1998, he received the worst possible news. He would have to face the future alone: his beloved wife had died unexpectedly, from a heart attack.

Eddleston’s daughter, concerned for his health, called their family doctor, a well-respected local man named Harold Shipman. He came to the house, sat with her father, held his hand, and spoke to him tenderly. Pushed for a prognosis as he left, Shipman replied portentously, “I wouldn’t buy him any Easter eggs.” By Wednesday, Eddleston was dead; Dr. Shipman had murdered him.

Harold Shipman was one of the most prolific serial killers in history. In a twenty-three-year career as a mild-mannered and well-liked family doctor, he injected at least two hundred and fifteen of his patients with lethal doses of opiates. He was finally arrested in September, 1998, six months after Eddleston’s death.

David Spiegelhalter, the author of an important and comprehensive new book, “The Art of Statistics” (Basic), was one of the statisticians tasked by the ensuing public inquiry to establish whether the mortality rate of Shipman’s patients should have aroused suspicion earlier. Then a biostatistician at Cambridge, Spiegelhalter found that Shipman’s excess mortality—the number of his older patients who had died in the course of his career over the number that would be expected of an average doctor’s—was a hundred and seventy-four women and forty-nine men at the time of his arrest. The total closely matched the number of victims confirmed by the inquiry….

In 1825, the French Ministry of Justice ordered the creation of a national collection of crime records. It seems to have been the first of its kind anywhere in the world—the statistics of every arrest and conviction in the country, broken down by region, assembled and ready for analysis. It’s the kind of data set we take for granted now, but at the time it was extraordinarily novel. This was an early instance of Big Data—the first time that mathematical analysis had been applied in earnest to the messy and unpredictable realm of human behavior.

Or maybe not so unpredictable. In the early eighteen-thirties, a Belgian astronomer and mathematician named Adolphe Quetelet analyzed the numbers and discovered a remarkable pattern. The crime records were startlingly consistent. Year after year, irrespective of the actions of courts and prisons, the number of murders, rapes, and robberies reached almost exactly the same total. There is a “terrifying exactitude with which crimes reproduce themselves,” Quetelet said. “We know in advance how many individuals will dirty their hands with the blood of others. How many will be forgers, how many poisoners.”

To Quetelet, the evidence suggested that there was something deeper to discover. He developed the idea of a “Social Physics,” and began to explore the possibility that human lives, like planets, had an underlying mechanistic trajectory. There’s something unsettling in the idea that, amid the vagaries of choice, chance, and circumstance, mathematics can tell us something about what it is to be human. Yet Quetelet’s overarching findings still stand: at some level, human life can be quantified and predicted. We can now forecast, with remarkable accuracy, the number of women in Germany who will choose to have a baby each year, the number of car accidents in Canada, the number of plane crashes across the Southern Hemisphere, even the number of people who will visit a New York City emergency room on a Friday evening….(More)”

Study finds Big Data eliminates confidentiality in court judgements

Swissinfo: “Swiss researchers have found that algorithms that mine large swaths of data can eliminate anonymity in federal court rulings. This could have major ramifications for transparency and privacy protection.

This is the result of a study by the University of Zurich’s Institute of Law, published in the legal journal “Jusletter” and shared by Swiss public television SRF on Monday.

The study relied on a “web scraping technique” or mining of large swaths of data. The researchers created a database of all decisions of the Supreme Court available online from 2000 to 2018 – a total of 122,218 decisions. Additional decisions from the Federal Administrative Court and the Federal Office of Public Health were also added.

Using an algorithm and manual searches for connections between data, the researchers were able to de-anonymise, in other words reveal identities, in 84% of the judgments in less than an hour.

In this specific study, the researchers were able to identify the pharma companies and medicines hidden in the documents of the complaints filed in court.  

Study authors say that this could have far-reaching consequences for transparency and privacy. One of the study’s co-authors Kerstin Noëlle Vokinger, professor of law at the University of Zurich explains that, “With today’s technological possibilities, anonymisation is no longer guaranteed in certain areas”. The researchers say the technique could be applied to any publicly available database.

Vokinger added there is a need to balance necessary transparency while safeguarding the personal rights of individuals.

Adrian Lobsiger, the Swiss Federal Data Protection Commissioner, told SRF that this confirms his view that facts may need to be treated as personal data in the age of technology….(More)”.