White House Challenges Artificial Intelligence Experts to Reduce Incarceration Rates


Jason Shueh at GovTech: “The U.S. spends $270 billion on incarceration each year, has a prison population of about 2.2 million and an incarceration rate that’s spiked 220 percent since the 1980s. But with the advent of data science, White House officials are asking experts for help.

On Tuesday, June 7, the White House Office of Science and Technology Policy’s Lynn Overmann, who also leads the White House Police Data Initiative, stressed the severity of the nation’s incarceration crisis while asking a crowd of data scientists and artificial intelligence specialists for aid.

“We have built a system that is too large, and too unfair and too costly — in every sense of the word — and we need to start to change it,” Overmann said, speaking at a Computing Community Consortium public workshop.

She argued that the U.S., a country that has the highest amount incarcerated citizens in the world, is in need of systematic reforms with both data tools to process alleged offenders and at the policy level to ensure fair and measured sentences. As a longtime counselor, advisor and analyst for the Justice Department and at the city and state levels, Overman said she has studied and witnessed an alarming number of issues in terms of bias and unwarranted punishments.

For instance, she said that statistically, while drug use is about equal between African-Americans and Caucasians, African-Americans are more likely to be arrested and convicted. They also receive longer prison sentences compared to Caucasian inmates convicted of the same crimes….

Data and digital tools can help curb such pitfalls by increasing efficiency, transparency and accountability, she said.

“We think these types of data exchanges [between officials and technologists] can actually be hugely impactful if we can figure out how to take this information and operationalize it for the folks who run these systems,” Obermann noted.

The opportunities to apply artificial intelligence and data analytics, she said, might include using it to improve questions on parole screenings, using it to analyze police body camera footage, and applying it to criminal justice data for legislators and policy workers….

If the private sector is any indication, artificial intelligence and machine learning techniques could be used to interpret this new and vast supply of law enforcement data. In an earlier presentation by Eric Horvitz, the managing director at Microsoft Research, Horvitz showcased how the company has applied artificial intelligence to vision and language to interpret live video content for the blind. The app, titled SeeingAI, can translate live video footage, captured from an iPhone or a pair of smart glasses, into instant audio messages for the seeing impaired. Twitter’s live-streaming app Periscope has employed similar technology to guide users to the right content….(More)”

Private Data and the Public Good


Gideon Mann‘s remarks on the occasion of the Robert Khan distinguished lecture at The City College of New York on 5/22/16: and opportunities about a specific aspect of this relationship, the broader need for computer science to engage with the real world. Right now, a key aspect of this relationship is being built around the risks and opportunities of the emerging role of data.

Ultimately, I believe that these relationships, between computer science andthe real world, between data science and real problems, hold the promise tovastly increase our public welfare. And today, we, the people in this room,have a unique opportunity to debate and define a more moral dataeconomy….

The hybrid research model proposes something different. The hybrid research model, embeds, as it were, researchers as practitioners.The thought was always that you would be going about your regular run of business,would face a need to innovate to solve a crucial problem, and would do something novel. At that point, you might choose to work some extra time and publish a paper explaining your innovation. In practice, this model rarely works as expected. Tight deadlines mean the innovation that people do in their normal progress of business is incremental..

This model separated research from scientific publication, and shortens thetime-window of research, to what can be realized in a few year time zone.For me, this always felt like a tremendous loss, with respect to the older so-called “ivory tower” research model. It didn’t seem at all clear how this kindof model would produce the sea change of thought engendered byShannon’s work, nor did it seem that Claude Shannon would ever want towork there. This kind of environment would never support the freestanding wonder, like the robot mouse that Shannon worked on. Moreover, I always believed that crucial to research is publication and participation in the scientific community. Without this engagement, it feels like something different — innovation perhaps.

It is clear that the monopolistic environment that enabled AT&T to support this ivory tower research doesn’t exist anymore. .

Now, the hybrid research model was one model of research at Google, butthere is another model as well, the moonshot model as exemplified byGoogle X. Google X brought together focused research teams to driveresearch and development around a particular project — Google Glass and the Self-driving car being two notable examples. Here the focus isn’t research, but building a new product, with research as potentially a crucial blocking issue. Since the goal of Google X is directly to develop a new product, by definition they don’t publish papers along the way, but they’re not as tied to short-term deliverables as the rest of Google is. However, they are again decidedly un-Bell-Labs like — a secretive, tightly focused, non-publishing group. DeepMind is a similarly constituted initiative — working, for example, on a best-in-the-world Go playing algorithm, with publications happening sparingly.

Unfortunately, both of these approaches, the hybrid research model and the moonshot model stack the deck towards a particular kind of research — research that leads to relatively short term products that generate corporate revenue. While this kind of research is good for society, it isn’t the only kind of research that we need. We urgently need research that is longterm, and that is undergone even without a clear financial local impact. Insome sense this is a “tragedy of the commons”, where a shared public good (the commons) is not supported because everyone can benefit from itwithout giving back. Academic research is thus a non-rival, non-excludible good, and thus reasonably will be underfunded. In certain cases, this takes on an ethical dimension — particularly in health care, where the choice ofwhat diseases to study and address has a tremendous potential to affect human life. Should we research heart disease or malaria? This decisionmakes a huge impact on global human health, but is vastly informed by the potential profit from each of these various medicines….

Private Data means research is out of reach

The larger point that I want to make, is that in the absence of places where long-term research can be done in industry, academia has a tremendous potential opportunity. Unfortunately, it is actually quite difficult to do the work that needs to be done in academia, since many of the resources needed to push the state of the art are only found in industry: in particular data.

Of course, academia also lacks machine resources, but this is a simpler problem to fix — it’s a matter of money, resources form the government could go to enabling research groups building their own data centers or acquiring the computational resources from the market, e.g. Amazon. This is aided by the compute philanthropy that Google and Microsoft practice that grant compute cycles to academic organizations.

But the data problem is much harder to address. The data being collected and generated at private companies could enable amazing discoveries and research, but is impossible for academics to access. The lack of access to private data from companies actually is much more significant effects than inhibiting research. In particular, the consumer level data, collected by social networks and internet companies could do much more than ad targeting.

Just for public health — suicide prevention, addiction counseling, mental health monitoring — there is enormous potential in the use of our online behavior to aid the most needy, and academia and non-profits are set-up to enable this work, while companies are not.

To give a one examples, anorexia and eating disorders are vicious killers. 20 million women and 10 million men suffer from a clinically significant eating disorder at some time in their life, and sufferers of eating disorders have the highest mortality rate of any other mental health disorder — with a jaw-dropping estimated mortality rate of 10%, both directly from injuries sustained by the disorder and by suicide resulting from the disorder.

Eating disorders are particular in that sufferers often seek out confirmatory information, blogs, images and pictures that glorify and validate what sufferers see as “lifestyle” choices. Browsing behavior that seeks out images and guidance on how to starve yourself is a key indicator that someone is suffering. Tumblr, pinterest, instagram are places that people host and seek out this information. Tumblr has tried to help address this severe mental health issue by banning blogs that advocate for self-harm and by adding PSA announcements to query term searches for queries for or related to anorexia. But clearly — this is not the be all and end all of work that could be done to detect and assist people at risk of dying from eating disorders. Moreover, this data could also help understand the nature of those disorders themselves…..

There is probably a role for a data ombudsman within private organizations — someone to protect the interests of the public’s data inside of an organization. Like a ‘public editor’ in a newspaper according to how you’ve set it up. There to protect and articulate the interests of the public, which means probably both sides — making sure a company’s data is used for public good where appropriate, and making sure the ‘right’ to privacy of the public is appropriately safeguarded (and probably making sure the public is informed when their data is compromised).

Next, we need a platform to make collaboration around social good between companies and between companies and academics. This platform would enable trusted users to have access to a wide variety of data, and speed process of research.

Finally, I wonder if there is a way that government could support research sabbaticals inside of companies. Clearly, the opportunities for this research far outstrip what is currently being done…(more)”

Data Science Ethical Framework


UK Cabinet Office: “Data science provides huge opportunities for government. Harnessing new forms of data with increasingly powerful computer techniques increases operational efficiency, improves public services and provides insight for better policymaking.

We want people in government to feel confident using data science techniques to innovate. This guidance is intended to bring together relevant laws and best practice, to give teams robust principles to work with.

The publication is a first version that we are asking the public, experts, civil servants and other interested parties to help us perfect and iterate. This will include taking on evidence from a public dialogue on data science ethics. It was published on 19 May by the Minister for Cabinet Office, Matt Hancock. If you would like to help us iterate the framework, find out how to get in touch at the end of this blog. See Data Science Ethical Framework (PDF, 8.28MB, 17 pages). This file may not be suitable for users of assistive technology. Request an accessible format.

Big Data for public policy: the quadruple helix


Julia Lane in the Journal of Policy Analysis and Management: “Data from the federal statistical system, particularly the Census Bureau, have long been a key resource for public policy. Although most of those data have been collected through purposive surveys, there have been enormous strides in the use of administrative records on business (Jarmin & Miranda, 2002), jobs (Abowd, Halti- wanger, & Lane, 2004), and individuals (Wagner & Layne, 2014). Those strides are now becoming institutionalized. The President has allocated $10 million to an Administrative Records Clearing House in his FY2016 budget. Congress is considering a bill to use administrative records, entitled the Evidence-Based Policymaking Commission Act, sponsored by Patty Murray and Paul Ryan. In addition, the Census Bureau has established a Center for “Big Data.” In my view, these steps represent important strides for public policy, but they are only part of the story. Public policy researchers must look beyond the federal statistical system and make use of the vast resources now available for research and evaluation.

All politics is local; “Big Data” now mean that policy analysis can increasingly be local. Modern empirical policy should be grounded in data provided by a network of city/university data centers. Public policy schools should partner with scholars in the emerging field of data science to train the next generation of policy researchers in the thoughtful use of the new types of data; the apparent secular decline in the applications to public policy schools is coincident with the emergence of data science as a field of study in its own right. The role of national statistical agencies should be fundamentally rethought—and reformulated to one of four necessary strands in the data infrastructure; that of providing benchmarks, confidentiality protections, and national statistics….(More)”

Where are Human Subjects in Big Data Research? The Emerging Ethics Divide


Paper by Jacob Metcalf and Kate Crawford: “There are growing discontinuities between the research practices of data science and established tools of research ethics regulation. Some of the core commitments of existing research ethics regulations, such as the distinction between research and practice, cannot be cleanly exported from biomedical research to data science research. These discontinuities have led some data science practitioners and researchers to move toward rejecting ethics regulations outright. These shifts occur at the same time as a proposal for major revisions to the Common Rule — the primary regulation governing human-subjects research in the U.S. — is under consideration for the first time in decades. We contextualize these revisions in long-running complaints about regulation of social science research, and argue data science should be understood as continuous with social sciences in this regard. The proposed regulations are more flexible and scalable to the methods of non-biomedical research, but they problematically exclude many data science methods from human-subjects regulation, particularly uses of public datasets. The ethical frameworks for big data research are highly contested and in flux, and the potential harms of data science research are unpredictable. We examine several contentious cases of research harms in data science, including the 2014 Facebook emotional contagion study and the 2016 use of geographical data techniques to identify the pseudonymous artist Banksy. To address disputes about human-subjects research ethics in data science,critical data studies should offer a historically nuanced theory of “data subjectivity” responsive to the epistemic methods, harms and benefits of data science and commerce….(More)”

Critics allege big data can be discriminatory, but is it really bias?


Pradip Sigdyal at CNBC: “…The often cited case of big data discrimination points to a research conducted few years ago by Latanya Sweeny, who heads the Data Privacy Lab at Harvard University.

The case involves Google ad results when searching for certain kinds of names on the internet. In her research, Sweeney found that distinct sounding names often associated with blacks showed up with a disproportionately higher number of arrest record ads compared to white sounding names by roughly 18 percent of the time. Google has since fixed the issue, although they never publicly stated what they did to correct the problem.

The proliferation of big data in the last few years has seen other allegations of improper use and bias. These allegations run the gamut, from online price discrimination and consequences of geographic targeting to the controversial use of crime predicting technology by law enforcement, and lack of sufficient representative[data] sampleused in some public works decisions.

The benefits of big data need to be balanced with the risks associated with applying modern technologies to address societal issues. Yet data advocates believe that democratization of data has in essence givenpower to the people to affect change by transferring ‘tribal knowledge’ from experts to data-savvy practitioners.

Big data is here to stay

According to some advocates, the problem is not so much that ‘big data discriminates’, but that failures by data professionals risk misinterpreting the findings at the heart of data mining and statistical learning. They add that the benefits far outweigh the concerns.

“In my academic research and industry consulting, I have seen tremendous benefits accruing to firms, organizations and consumers alike from the use of data-driven decision-making, data science, and business analytics,” Anindya Ghose, the director of Center for Business Analytics at New York University’s Stern School of Business, said.

“To be perfectly honest, I do not at all understand these big-data cynics who engage in fear mongering about the implications of data analytics,” Ghose said.

“Here is my message to the cynics and those who keep cautioning us: ‘Deal with it, big data analytics is here to stay forever’.”…(More)”

Big Risks, Big Opportunities: the Intersection of Big Data and Civil Rights


Latest White House report on Big Data charts pathways for fairness and opportunity but also cautions against re-encoding bias and discrimination into algorithmic systems: ” Advertisements tailored to reflect previous purchasing decisions; targeted job postings based on your degree and social networks; reams of data informing predictions around college admissions and financial aid. Need a loan? There’s an app for that.

As technology advances and our economic, social, and civic lives become increasingly digital, we are faced with ethical questions of great consequence. Big data and associated technologies create enormous new opportunities to revisit assumptions and instead make data-driven decisions. Properly harnessed, big data can be a tool for overcoming longstanding bias and rooting out discrimination.

The era of big data is also full of risk. The algorithmic systems that turn data into information are not infallible—they rely on the imperfect inputs, logic, probability, and people who design them. Predictors of success can become barriers to entry; careful marketing can be rooted in stereotype. Without deliberate care, these innovations can easily hardwire discrimination, reinforce bias, and mask opportunity.

Because technological innovation presents both great opportunity and great risk, the White House has released several reports on “big data” intended to prompt conversation and advance these important issues. The topics of previous reports on data analytics included privacy, prices in the marketplace, and consumer protection laws. Today, we are announcing the latest report on big data, one centered on algorithmic systems, opportunity, and civil rights.

The first big data report warned of “the potential of encoding discrimination in automated decisions”—that is, discrimination may “be the inadvertent outcome of the way big data technologies are structured and used.” A commitment to understanding these risks and harnessing technology for good prompted us to specifically examine the intersection between big data and civil rights.

Using case studies on credit lending, employment, higher education, and criminal justice, the report we are releasing today illustrates how big data techniques can be used to detect bias and prevent discrimination. It also demonstrates the risks involved, particularly how technologies can deliberately or inadvertently perpetuate, exacerbate, or mask discrimination.

The purpose of the report is not to offer remedies to the issues it raises, but rather to identify these issues and prompt conversation, research—and action—among technologists, academics, policy makers, and citizens, alike.

The report includes a number of recommendations for advancing work in this nascent field of data and ethics. These include investing in research, broadening and diversifying technical leadership, cross-training, and expanded literacy on data discrimination, bolstering accountability, and creating standards for use within both the government and the private sector. It also calls on computer and data science programs and professionals to promote fairness and opportunity as part of an overall commitment to the responsible and ethical use of data.

Big data is here to stay; the question is how it will be used: to advance civil rights and opportunity, or to undermine them….(More)”

Ebola: A Big Data Disaster


Study by Sean Martin McDonald: “…undertaken with support from the Open Society Foundation, Ford Foundation, and Media Democracy Fund, explores the use of Big Data in the form of Call Detail Record (CDR) data in humanitarian crisis.

It discusses the challenges of digital humanitarian coordination in health emergencies like the Ebola outbreak in West Africa, and the marked tension in the debate around experimentation with humanitarian technologies and the impact on privacy. McDonald’s research focuses on the two primary legal and human rights frameworks, privacy and property, to question the impact of unregulated use of CDR’s on human rights. It also highlights how the diffusion of data science to the realm of international development constitutes a genuine opportunity to bring powerful new tools to fight crisis and emergencies.

Analysing the risks of using CDRs to perform migration analysis and contact tracing without user consent, as well as the application of big data to disease surveillance is an important entry point into the debate around use of Big Data for development and humanitarian aid. The paper also raises crucial questions of legal significance about the access to information, the limitation of data sharing, and the concept of proportionality in privacy invasion in the public good. These issues hold great relevance in today’s time where big data and its emerging role for development, involving its actual and potential uses as well as harms is under consideration across the world.

The paper highlights the absence of a dialogue around the significant legal risks posed by the collection, use, and international transfer of personally identifiable data and humanitarian information, and the grey areas around assumptions of public good. The paper calls for a critical discussion around the experimental nature of data modelling in emergency response due to mismanagement of information has been largely emphasized to protect the contours of human rights….

See Sean Martin McDonald – “Ebola: A Big Data Disaster” (PDF).

 

On Researching Data and Communication


Paper by Andrew Schrock: “We are awash in predictions about our data-driven future. Enthusiasts believe it will offer new ways to research behavior. Critics worry it will enable powerful regimes of institutional control. Both visions, although polar opposites, tend to downplay the importance of communication. As a result, the role of communication in human-centered data science has rarely been considered. This article fills this gap by outlining three perspectives on data that foreground communication. First, I briefly review the common social scientific perspective: “communication as data.” Next, I elaborate on two less explored perspectives. A “data as communication” perspective captures how data imperfectly carry meanings and guide action. “Communication around data” describes communication in organizational and institutional data cultures. I conclude that communication offers nuanced perspectives to inform human-centered data science. Researchers should embrace a robust agenda, particularly when researching the relationship between data and power…(More)”

This Is How Visualizing Open Data Can Help Save Lives


Alexander Howard at the Huffington Post: “Cities are increasingly releasing data that they can use to make life better for their residents online — enabling journalists and researchers to better inform the public.

Los Angeles, for example, has analyzed data about injuries and deaths on its streets and published it online. Now people can check its conclusions and understand why LA’s public department prioritizes certain intersections.

The impact from these kinds of investments can lead directly to saving lives and preventing injuries. The work is part of a broader effort around the world to make cities safer.

Like New York City, San Francisco and Portland, Oregon, Los Angeles has adopted Sweden’s “Vision Zero” program as part of its strategy for eliminating traffic deathsCalifornia led the nation in bicycle deaths in 2014.

At visionzero.lacity.org, you can see that the City of Los Angeles is using data visualization to identify the locations of “high injury networks,” or the 6 percent of intersections that account for 65 percent of the severe injuries in the area.

CITY OF LOS ANGELES

The work is the result of LA’s partnership with University of South California graduate students. As a result of these analyses, the Los Angeles Police Department has been cracking down on jaywalking near the University of Southern California.

Abhi Nemani, the former chief data officer for LA, explained why the city needed to “go back to school” for help.

“In resource-constrained environments — the environment most cities find themselves in these days — you often have to beg, borrow, and steal innovation; particularly so, when it comes to in-demand resources such as data science expertise,” he told the Huffington Post.

“That’s why in Los Angeles, we opted to lean on the community for support: both the growing local tech sector and the expansive academic base. The academic community, in particular, was eager to collaborate with the city. In fact, most — if not all — local institutions reached out to me at some point asking to partner on a data science project with their graduate students.”

The City of Los Angeles is now working with another member of its tech sector toeliminate traffic deaths. DataScience, based in Culver City, California, received $22 million dollars in funding in December to make predictive insights for customers.

“The City of Los Angeles is very data-driven,” DataScience CEO Ian Swanson told HuffPost. “I commend Mayor Eric Garcetti and the City of Los Angeles on the openness, transparency, and availability of city data initiatives, like Vision Zero, put the City of Los Angeles‘ data into action and improve life in this great city.”

DataScience created an interactive online map showing the locations of collisions involving bicycles across the city….(More)”