Fairness in Machine Learning

Presentation by Delip Rao: “…The models you create have power to get people arrested or vindicated, get loans approved or rejected, determine what interest rate should be charged for such loans, who should be shown to you in your long list of pursuits on your Tinder, what news do you read, who gets called for a job phone screen or even a college admission… the list goes on.

So what can you do about it?…

I have detailed notes for some of these slides. If you would like to follow those, try going directly to Google Slides.


Innovation and Its Enemies: Why People Resist New Technologies

]Book by Calestous Juma: “The rise of artificial intelligence has rekindled a long-standing debate regarding the impact of technology on employment. This is just one of many areas where exponential advances in technology signal both hope and fear, leading to public controversy. This book shows that many debates over new technologies are framed in the context of risks to moral values, human health, and environmental safety. But it argues that behind these legitimate concerns often lie deeper, but unacknowledged, socioeconomic considerations. Technological tensions are often heightened by perceptions that the benefits of new technologies will accrue only to small sections of society while the risks will be more widely distributed. Similarly, innovations that threaten to alter cultural identities tend to generate intense social concern. As such, societies that exhibit great economic and political inequities are likely to experience heightened technological controversies.

Drawing from nearly 600 years of technology history, Innovation and Its Enemies identifies the tension between the need for innovation and the pressure to maintain continuity, social order, and stability as one of today’s biggest policy challenges. It reveals the extent to which modern technological controversies grow out of distrust in public and private institutions. Using detailed case studies of coffee, the printing press, margarine, farm mechanization, electricity, mechanical refrigeration, recorded music, transgenic crops, and transgenic animals, it shows how new technologies emerge, take root, and create new institutional ecologies that favor their establishment in the marketplace. The book uses these lessons from history to contextualize contemporary debates surrounding technologies such as artificial intelligence, online learning, 3D printing, gene editing, robotics, drones, and renewable energy. It ultimately makes the case for shifting greater responsibility to public leaders to work with scientists, engineers, and entrepreneurs to manage technological change, make associated institutional adjustments, and expand public engagement on scientific and technological matters….(More)”

Accountable machines: bureaucratic cybernetics?

Alison Powell at LSE Media Policy Project Blog: “Algorithms are everywhere, or so we are told, and the black boxes of algorithmic decision-making make oversight of processes that regulators and activists argue ought to be transparent more difficult than in the past. But when, and where, and which machines do we wish to make accountable, and for what purpose? In this post I discuss how algorithms discussed by scholars are most commonly those at work on media platforms whose main products are the social networks and attention of individuals. Algorithms, in this case, construct individual identities through patterns of behaviour, and provide the opportunity for finely targeted products and services. While there are serious concerns about, for instance, price discrimination, algorithmic systems for communicating and consuming are, in my view, less inherently problematic than processes that impact on our collective participation and belonging as citizenship. In this second sphere, algorithmic processes – especially machine learning – combine with processes of governance that focus on individual identity performance to profoundly transform how citizenship is understood and undertaken.

Communicating and consuming

In the communications sphere, algorithms are what makes it possible to make money from the web for example through advertising brokerage platforms that help companies bid for ads on major newspaper websites. IP address monitoring, which tracks clicks and web activity, creates detailed consumer profiles and transform the everyday experience of communication into a constantly-updated production of consumer information. This process of personal profiling is at the heart of many of the concerns about algorithmic accountability. The consequence of perpetual production of data by individuals and the increasing capacity to analyse it even when it doesn’t appear to relate has certainly revolutionalised advertising by allowing more precise targeting, but what has it done for areas of public interest?

John Cheney-Lippold identifies how the categories of identity are now developed algorithmically, since a category like gender is not based on self-discloure, but instead on patterns of behaviour that fit with expectations set by previous alignment to a norm. In assessing ‘algorithmic identities’, he notes that these produce identity profiles which are narrower and more behaviour-based than the identities that we perform. This is a result of the fact that many of the systems that inspired the design of algorithmic systems were based on using behaviour and other markers to optimise consumption. Algorithmic identity construction has spread from the world of marketing to the broader world of citizenship – as evidenced by the Citizen Ex experiment shown at the Web We Want Festival in 2015.

Individual consumer-citizens

What’s really at stake is that the expansion of algorithmic assessment of commercially derived big data has extended the frame of the individual consumer into all kinds of other areas of experience. In a supposed ‘age of austerity’ when governments believe it’s important to cut costs, this connects with the view of citizens as primarily consumers of services, and furthermore, with the idea that a citizen is an individual subject whose relation to a state can be disintermediated given enough technology. So, with sensors on your garbage bins you don’t need to even remember to take them out. With pothole reporting platforms like FixMyStreet, a city government can be responsive to an aggregate of individual reports. But what aspects of our citizenship are collective? When, in the algorithmic state, can we expect to be together?

Put another way, is there any algorithmic process to value the long term education, inclusion, and sustenance of a whole community for example through library services?…

Seeing algorithms – machine learning in particular – as supporting decision-making for broad collective benefit rather than as part of ever more specific individual targeting and segmentation might make them more accountable. But more importantly, this would help algorithms support society – not just individual consumers….(More)”

It’s not big data that discriminates – it’s the people that use it

 in the Conversation: “Data can’t be racist or sexist, but the way it is used can help reinforce discrimination. The internet means more data is collected about us than ever before and it is used to make automatic decisions that can hugely affect our lives, from our credit scores to our employment opportunities.

If that data reflects unfair social biases against sensitive attributes, such as our race or gender, the conclusions drawn from that data might also be based on those biases.

But this era of “big data” doesn’t need to to entrench inequality in this way. If we build smarter algorithms to analyse our information and ensure we’re aware of how discrimination and injustice may be at work, we can actually use big data to counter our human prejudices.

This kind of problem can arise when computer models are used to make predictions in areas such as insurance, financial loans and policing. If members of a certain racial group have historically been more likely to default on their loans, or been more likely to be convicted of a crime, then the model can deem these people more risky. That doesn’t necessarily mean that these people actually engage in more criminal behaviour or are worse at managing their money. They may just be disproportionately targeted by police and sub-prime mortgage salesmen.

Excluding sensitive attributes

Data scientist Cathy O’Neil has written about her experience of developing models for homeless services in New York City. The models were used to predict how long homeless clients would be in the system and to match them with appropriate services. She argues that including race in the analysis would have been unethical.

If the data showed white clients were more likely to find a job than black ones, the argument goes, then staff might focus their limited resources on those white clients that would more likely have a positive outcome. While sociological research has unveiled the ways that racial disparities in homelessness and unemployment are the result of unjust discrimination, algorithms can’t tell the difference between just and unjust patterns. And so datasets should exclude characteristics that may be used to reinforce the bias, such as race.

But this simple response isn’t necessarily the answer. For one thing, machine learning algorithms can often infer sensitive attributes from a combination of other, non-sensitive facts. People of a particular race may be more likely to live in a certain area, for example. So excluding those attributes may not be enough to remove the bias….

An enlightened service provider might, upon seeing the results of the analysis, investigate whether and how racism is a barrier to their black clients getting hired. Equipped with this knowledge they could begin to do something about it. For instance, they could ensure that local employers’ hiring practices are fair and provide additional help to those applicants more likely to face discrimination. The moral responsibility lies with those responsible for interpreting and acting on the model, not the model itself.

So the argument that sensitive attributes should be stripped from the datasets we use to train predictive models is too simple. Of course, collecting sensitive data should be carefully regulated because it can easily be misused. But misuse is not inevitable, and in some cases, collecting sensitive attributes could prove absolutely essential in uncovering, predicting, and correcting unjust discrimination. For example, in the case of homeless services discussed above, the city would need to collect data on ethnicity in order to discover potential biases in employment practices….(More)

A machine intelligence commission for the UK

Geoff Mulgan at NESTA: ” This paper makes the case for creating a Machine Intelligence Commission – a new public institution to help the development of new generations of algorithms, machine learning tools and uses of big data, ensuring that the public interest is protected.

I argue that new institutions of this kind – which can interrogate, inspect and influence technological development – are a precondition for growing informed public trust. That trust will, in turn, be essential if we are to reap the full potential public and economic benefits from new technologies. The proposal draws on lessons from fields such as human fertilisation, biotech and energy, which have shown how trust can be earned, and how new industries can be grown.  It also draws on lessons from the mistakes made in fields like GM crops and personal health data, where lack of trust has impeded progress….(More)”

Facebook Is Making a Map of Everyone in the World

Robinsion Meyer at The Atlantic: “Americans inhabit an intricately mapped world. Type “Burger King” into an online box, and Google will cough up a dozen nearby options, each keyed to a precise latitude and longitude.

But throughout much of the world, local knowledge stays local. While countries might conduct censuses, the data doesn’t go much deeper than the county or province level.

Take population data, for instance: More than 7.4 billion humans sprawl across this planet of ours. They live in dense urban centers, in small towns linked by farms, and alone on the outskirts of jungles. But no one’s sure where, exactly, many of them live.

Now, Facebook says it has mapped almost 2 billion people better than any previous project. The company’s Connectivity Labs announced this week that it created new, high-resolution population-distribution maps of 20 countries, most of which are developing. It won’t release most of the maps until later this year,but if they’re accurate, they will be the best-quality population maps ever made for most of those places.

The maps will be notable for another reason, too: If they’re accurate, they ‘ll signal the arrival of a new, AI-aided age of cartography.

In the rich world, reliable population information is taken for granted.  But elsewhere, population-distribution maps have dozens of applications in different fields. Urban planners need to estimate city density so they can place and improve roads. Epidemiologists and public-health workers use them to track outbreaks or analyze access to health care. And after a disaster, population maps can be used (along with crisis mapping) to prioritize where emergency aid gets sent….(More)

Drones better than human rescuers at following mountain pathways

Springwise: “Every year in Switzerland, emergency centers respond to around 1,000 call outs for lost and injured hikers. It can often take hours and significant manpower to locate lost mountaineers, but new software for quadcopter drones is making the hunt quicker and easier, and has the potential to help find human survivors in disaster zones around the world.

The drone uses a computer algorithm called a Deep Neural Network. The program was developed by researchers at the University of Zurich and the Dalle Molle Institute for Artificial Intelligence. The drone uses the algorithm to learn trails and paths through a pair of small cameras, interpreting the images and recognizing man-made pathways. Even when working on a previously unseen trail, it was able to guess the correct direction in 85 percent of the cases. The drones’ speed and accuracy make them more effective than human trackers.

The researchers hope that eventually multiple small drones could be combined with human search and rescue missions, to cover more terrain and find people faster. The drones can cover terrain quickly and check hazardous areas to minimize risk to human workers, and its AI can identify paths and avoid crashing without any human involvement….(More)”

Forecasting Domestic Violence: A Machine Learning Approach to Help Inform Arraignment Decisions

Richard A. Berk, Susan B. Sorenson and Geoffrey Barnes in the The Journal of Empirical Legal Studies: “Arguably the most important decision at an arraignment is whether to release an offender until the date of his or her next scheduled court appearance. Under the Bail Reform Act of 1984, threats to public safety can be a key factor in that decision. Implicitly, a forecast of “future dangerousness” is required. In this article, we consider in particular whether usefully accurate forecasts of domestic violence can be obtained. We apply machine learning to data on over 28,000 arraignment cases from a major metropolitan area in which an offender faces domestic violence charges. One of three possible post-arraignment outcomes is forecasted within two years: (1) a domestic violence arrest associated with a physical injury, (2) a domestic violence arrest not associated with a physical injury, and (3) no arrests for domestic violence. We incorporate asymmetric costs for different kinds of forecasting errors so that very strong statistical evidence is required before an offender is forecasted to be a good risk. When an out-of-sample forecast of no post-arraignment domestic violence arrests within two years is made, it is correct about 90 percent of the time. Under current practice within the jurisdiction studied, approximately 20 percent of those released after an arraignment for domestic violence are arrested within two years for a new domestic violence offense. If magistrates used the methods we have developed and released only offenders forecasted not to be arrested for domestic violence within two years after an arraignment, as few as 10 percent might be arrested. The failure rate could be cut nearly in half. Over a typical 24-month period in the jurisdiction studied, well over 2,000 post-arraignment arrests for domestic violence perhaps could be averted….(More)”

Digital Decisions: Policy Tools in Automated Decision-Making

Ali Lange at CDT: “Digital technology has empowered new voices, made the world more accessible, and increased the speed of almost every decision we make as businesses, communities, and individuals. Much of this convenience is powered by lines of code that rapidly execute instructions based on rules set by programmers (or, in the case of machine learning, generated from statistical correlations in massive datasets)—otherwise known as algorithms. The technology that drives our automated world is sophisticated and obscure, making it difficult to determine how the decisions made by automated systems might fairly or unfairly, positively or negatively, impact individuals. It is also harder to identify where bias may inadvertently arise. Algorithmically driven outcomes are influenced, but not exclusively determined, by technical and legal limitations. The landscape of algorithmic decision-making is also shaped by policy choices in technology companies and by government agencies. Some automated systems create positive outcomes for individuals, and some threaten a fair society. By looking at a few case studies and drawing out the prevailing policy principle, we can draw conclusions about how to critically approach the existing web of automated decision-making. Before considering these specific examples, we will present a summary of the policy debate around data-driven decisions to give context to the examples raised. Then we will analyze three case studies from diverse industries to determine what policy interventions might be applied more broadly to encourage positive outcomes and prevent the risk of discrimination….(More)”

Political Speech Generation

Valentin Kassarnig at arXiv: “In this report we present a system that can generate political speeches for a desired political party. Furthermore, the system allows to specify whether a speech should hold a supportive or opposing opinion. The system relies on a combination of several state-of-the-art NLP methods which are discussed in this report. These include n-grams, Justeson & Katz POS tag filter, recurrent neural networks, and latent Dirichlet allocation. Sequences of words are generated based on probabilities obtained from two underlying models: A language model takes care of the grammatical correctness while a topic model aims for textual consistency. Both models were trained on the Convote dataset which contains transcripts from US congressional floor debates. Furthermore, we present a manual and an automated approach to evaluate the quality of generated speeches. In an experimental evaluation generated speeches have shown very high quality in terms of grammatical correctness and sentence transitions….(More)”