Algorithms and Bias


Q. and A. With Cynthia Dwork in the New York Times: “Algorithms have become one of the most powerful arbiters in our lives. They make decisions about the news we read, the jobs we get, the people we meet, the schools we attend and the ads we see.

Yet there is growing evidence that algorithms and other types of software can discriminate. The people who write them incorporate their biases, and algorithms often learn from human behavior, so they reflect the biases we hold. For instance, research has shown that ad-targeting algorithms have shown ads for high-paying jobs to men but not women, and ads for high-interest loans to people in low-income neighborhoods.

Cynthia Dwork, a computer scientist at Microsoft Research in Silicon Valley, is one of the leading thinkers on these issues. In an Upshot interview, which has been edited, she discussed how algorithms learn to discriminate, who’s responsible when they do, and the trade-offs between fairness and privacy.

Q: Some people have argued that algorithms eliminate discriminationbecause they make decisions based on data, free of human bias. Others say algorithms reflect and perpetuate human biases. What do you think?

A: Algorithms do not automatically eliminate bias. Suppose a university, with admission and rejection records dating back for decades and faced with growing numbers of applicants, decides to use a machine learning algorithm that, using the historical records, identifies candidates who are more likely to be admitted. Historical biases in the training data will be learned by the algorithm, and past discrimination will lead to future discrimination.

Q: Are there examples of that happening?

A: A famous example of a system that has wrestled with bias is the resident matching program that matches graduating medical students with residency programs at hospitals. The matching could be slanted to maximize the happiness of the residency programs, or to maximize the happiness of the medical students. Prior to 1997, the match was mostly about the happiness of the programs.

This changed in 1997 in response to “a crisis of confidence concerning whether the matching algorithm was unreasonably favorable to employers at the expense of applicants, and whether applicants could ‘game the system,’ ” according to a paper by Alvin Roth and Elliott Peranson published in The American Economic Review.

Q: You have studied both privacy and algorithm design, and co-wrote a paper, “Fairness Through Awareness,” that came to some surprising conclusions about discriminatory algorithms and people’s privacy. Could you summarize those?

A: “Fairness Through Awareness” makes the observation that sometimes, in order to be fair, it is important to make use of sensitive information while carrying out the classification task. This may be a little counterintuitive: The instinct might be to hide information that could be the basis of discrimination….

Q: The law protects certain groups from discrimination. Is it possible to teach an algorithm to do the same?

A: This is a relatively new problem area in computer science, and there are grounds for optimism — for example, resources from the Fairness, Accountability and Transparency in Machine Learning workshop, which considers the role that machines play in consequential decisions in areas like employment, health care and policing. This is an exciting and valuable area for research. …(More)”

Citizen Science used in studying Seasonal Variation in India


Rohin Daswani at the Commons Lab, Woodrow Wilson International Center for Scholars: “Climate change has started affecting many countries around the world. While every country is susceptible to the risks of global warming some countries, such as India, are especially vulnerable.

India’s sheer dependence on rainfall to irrigate its vast agricultural lands and to feed its economy makes it highly vulnerable to climate change. A report from the UN Intergovernmental Panel on Climate Change (IPCC) predicts global temperature will increase between 0.3 and 4.8 degrees Celsius and sea levels will rise 82cm (32 in) by the late 21st century. But what effect will the changing rainfall pattern have on the seasonal variation?

One way to study seasonal variation in India is to analyze the changing patterns of flowering and fruiting of common trees like the Mango and Amaltas trees. SeasonWatch , a program part of the National Center for Biological Sciences (NCBS), the biological wing of the Tata Institute for Fundamental Research, does exactly that. It is an India-wide program that studies the changing seasons by monitoring the seasonal cycles of flowering, fruiting and leaf flush of common trees. And how does it do that? It does it by utilizing the idea of Citizen Science. Anybody, be it children or adults, interested in trees and the effects of climate change can participate. All they have to do is register, select a tree near them and monitor it every week. The data is uploaded to a central website and is analyzed for changing patterns of plant life, and the effects of climate change on plant life cycle. The data is also open source so anyone can get access to it if they wish to. With all this information one could answer questions which were previously impossible to answer such as:

  • How does the flowering of Neem change across India?
  • Is fruiting of Tamarind different in different parts of the country depending on rainfall in the previous year?
  • Is year to year variation in flowering and fruiting time of Mango related to Winter temperatures?

Using Citizen Science and crowdsourcing, programs such as SeasonWatch have expanded the scope and work of conservation biology in various ecosystems across India….(More)”

Tools to Innovate: Data Analytics, Risk Management, and Shared Services


New report by The Business of Government Center: “Today, governments have access to a variety of tools to successfully implement agency programs. For example, Data Analytics—especially of financial data—can be used to better inform decision making by ensuring agencies have the information they need at the point of time that it can be most effective. In addition, governments at all levels can more effectively address risks using new Risk Management approaches. And finally, Shared Services can not only save money, but also stimulate innovation, improve decisionmaking, and increase the quality of services expected by citizens.

The IBM Center has published a variety of reports related to these topics and accordingly, we have brought key findings on these topics together in the compilation that follows. We welcome your thoughts on these issues, and look forward to a continued dialogue with government leaders and stakeholders on actions to help agencies achieve their mission effectively and efficiently….(More)”

Designing Successful Governance Groups


The Berkman Center for Internet & Society, together with the Global Network of Internet and Society Research Centers (NoC), is pleased to announce the release of a new publication, “Designing Successful Governance Groups: Lessons for Leaders from Real-World Examples,” authored by Ryan Budish, Sarah Myers West, and Urs Gasser.

Solutions to many of the world’s most pressing governance challenges, ranging from natural resource management to the governance of the Internet, require leaders to engage in multistakeholder processes. Yet, relatively little is known how to successfully lead such processes.  This paper outlines a set of useful, actionable steps for policymakers and other stakeholders charged with creating, convening, and leading governance groups. The tools for success described in this document are distilled from research published earlier this year by Berkman and the NoC, a comprehensive report entitled “Multistakeholder as Governance Groups: Observations From Case Studies,” which closely examines 12 examples of real-world governance structures from around the globe and draws new conclusions about how to successfully form and operate governance groups.

This new publication, “Designing Successful Governance Groups,” focuses on the operational recommendations drawn from the earlier case studies and their accompanying synthesis paper. It provides an actionable starting place for those interested in understanding some of the critical ingredients for successful multistakeholder governance.

At the core of this paper are three steps that have helped conveners of successful governance groups:

  1. Establish clear success criteria

  2. Set the initial framework conditions for the group

  3. Continually adjust steps 1 and 2 based on evolving contextual factors

The paper explores these three steps in greater detail and explains how they help implement one central idea: Governance groups work best when they are flexible and adaptive to new circumstances and needs and have conveners who understand how their decisions will affect the inclusiveness, transparency, accountability, and effectiveness of the group….(More)”

What We’ve Learned About Sharing Our Data Analysis


Jeremy Singer-Vine at Source: “Last Friday morning, Jessica Garrison, Ken Bensinger, and I published a BuzzFeed News investigation highlighting the ease with which American employers have exploited and abused a particular type of foreign worker—those on seasonal H–2 visas. The article drew on seven months’ worth of reporting, scores of interviews, hundreds of documents—and two large datasets maintained by the Department of Labor.

That same morning, we published the corresponding data, methodologies, and analytic code on GitHub. This isn’t the first time we’ve open-sourced our data and analysis; far from it. But the H–2 project represents our most ambitious effort yet. In this post, I’ll describe our current thinking on “reproducible data analyses,” and how the H–2 project reflects those thoughts.

What Is “Reproducible Data Analysis”?

It’s helpful to break down a couple of slightly oversimplified definitions. Let’s call “open-sourcing” the act of publishing the raw code behind a software project. And let’s call “reproducible data analysis” the act of open-sourcing the code and data required to reproduce a set of calculations.

Journalism has seen a mini-boom of reproducible data analysis in the past year or two. (It’s far froma novel concept, of course.) FiveThirtyEight publishes data and re-runnable computer code for many of their stories. You can download the brains and brawn behind Leo, the New York Times’ statistical model for forecasting the outcome of the 2014 midterm Senate elections. And if you want to re-runBarron’s magazine’s analysis of SEC Rule 605 reports, you can do that, too. The list goes on.

….

Why Reproducible Data Analysis?

At BuzzFeed News, our main motivation is simple: transparency. If an article includes our own calculations (and are beyond a grade-schooler’s pen-and-paper calculations), then you should be able to see—and potentially criticize—how we did it…..

There are reasons, of course, not to publish a fully-reproducible analysis. The most obvious and defensible reason: Your data includes Social Security numbers, state secrets, or other sensitive information. Sometimes, you’ll be able to scrub these bits from your data. Other times, you won’t. (Adetailed methodology is a good alternative.)

How To Publish Reproducible Data Analysis?

At BuzzFeed News, we’re still figuring out the best way to skin this cat. Other news organizations might be arrive at entirely opposite conclusions. That said, here are some tips, based on our experience:

Describe the main data sources, and how you got them. Art appraisers and data-driven reporters agree: Provenance matters. Who collected the data? What universe of things does it quantify? How did you get it?.… (More)”

Open Data and Sub-national Governments: Lessons from Developing Countries


WebFoundation: “Open government data (OGD) as a concept is gaining currency globally due to the strong advocacy of global organisations as Open Government Partnership. In recent years, there has been increased commitment on the part of national governments to proactively disclose information. However, much of the discussion on OGD is at the national level, especially in developing countries where commitments of proactive disclosure is conditioned by the commitments of national governments as expressed through the OGP national action plans. However, the local is important in the context of open data. In decentralized contexts, the local is where data is collected and stored, where there is strong feasibility that data will be published, and where data can generate the most impact when used. This synthesis paper wants to refocus the discussion of open government data in sub-national contexts by analysing nine country papers produced through the Open Data in Developing Countries research project.

Using a common research framework that focuses on context, governance setting, and open data initiatives, the study found out that there is substantial effort on the part of sub-national governments to proactively disclose data, however, the design delimits citizen participation, and eventually, use. Second, context demands diff erent roles for intermediaries and diff erent types of initiatives to create an enabling environment for open data. Finally, data quality will remain a critical challenge for sub-national governments in developing countries and it will temper potential impact that open data will be able to generate. Download the full research paper here

Crowdfunding sites aim to make the law accessible to all


Jonathan Ford at the Financial Times: “Using the internet to harness the financial power of crowds is hardly novel. Almost since the first electronic impulse pinged its way across the world wide web, entrepreneurs have been dreaming up sites to facilitate everything from charitable donation to hard-nosed investment.

Peer-to-peer lending is now almost part of the mainstream. JustGiving, the charitable portal, has been going since 2000. But employing the web to raise money for legal actions remains a less well ploughed piece of virtual terrain.

At first glance, you might wonder why this is. There is already a booming offline trade in the commercial funding of litigation, especially in America and Britain, whether through lawyers’ no-win, no-fee arrangements or third party investment. And, indeed, a few pioneering crowdfunding vehicles have recently emerged in the US. One such is Invest4Justice, a site that boldly touts returns of “500 per cent plus in a few months”.

Whether these eye-catching figures are ultimately deliverable is — as lawyers like to say — moot. But there are risks in seeking to share the fruits of a third party’s action that can make it perilous for the crowdfunding investor. One is that when actions fail, those same backers might have to pay not only their own, but the successful party’s, costs….

But not all crowdfunding ventures seek to reward participants in the currency of cold financial return. Crowdjustice, Britain’s first legal crowdfunding website, seeks to scratch quite a different itch in the psyches of its participants….Among the causes it has taken up are a criminal appeal and a planning dispute in Lancashire involving a landfill site. The only real requirement for consideration is that the legal David confronting the corporate or governmental Goliath must have already engaged a lawyer to take on their case….This certainly means the risk of being dragged into proceedings is far lower. But it also raises a question: why would the public want to donate money to lawyers in the first place?

Ms Salasky thinks it ranges from a sense of justice to enlightened self-interest. “Donors can be people who take human rights seriously, but they could also be those who worry that something which is happening to someone else could also happen to them,” she says. It is one reason why perhaps the most potent application is seen to be in the fields of environmental and planning law. …(More)”

 

100 parliaments as open data, ready for you to use


Myfanwy Nixon at mySociety’s blog and OpeningParliament: “If you need data on the people who make up your parliament, another country’s parliament, or indeed all parliaments, you may be in luck.

Every Politician, the latest Poplus project, aims to collect, store and share information about every parliament in the world, past and present—and it already contains 100 of them.

What’s more, it’s all provided as Open Data to anyone who would like to use it to power a civic tech project. We’re thinking parliamentary monitoring organisations, journalists, groups who run access-to-democracy sites like our own WriteToThem, and especially researchers who want to do analysis across multiple countries.

But isn’t that data already available?

Yes and no. There’s no doubt that you can find details of most parliaments online, either on official government websites, on Wikipedia, or on a variety of other places online.

But, as you might expect from data that’s coming from hundreds of different sources, it’s in a multitude of different formats. That makes it very hard to work with in any kind of consistent fashion.

Every Politician standardises all of its data into the Popolo standard and then provides it in two simple downloadable formats:

  • csv, which contains basic data that’s easy to work with on spreadsheets
  • JSON which contains richer data on each person, and is ideal for developers

This standardisation means that it should now be a lot easier to work on projects across multiple countries, or to compare one country’s data with another. It also means that data works well with other Poplus Components….(More)”

Beyond the Common Rule: Ethical Structures for Data Research in Non-Academic Settings


Future of Privacy Forum: “In the wake of last year’s news about the Facebook “emotional contagion” study and subsequent public debate about the role of A/B Testing and ethical concerns around the use of Big Data, FPF Senior Fellow Omer Tene participated in a December symposum on corporate consumer research hosted by Silicon Flatirons. This past month, the Colorado Technology Law Journal published a series of papers that emerged out of the symposium, including “Beyond the Common Rule: Ethical Structures for Data Research in Non-Academic Settings.”

“Beyond the Common Rule,” by Jules Polonetsky, Omer Tene, and Joseph Jerome, continues the Future of Privacy Forum’s effort to build on the notion of consumer subject review boards first advocated by Ryan Calo at FPF’s 2013 Big Data symposium. It explores how researchers, increasingly in corporate settings, are analyzing data and testing theories using often sensitive personal information. Many of these new uses of PII are simply natural extensions of current practices, and are either within the expectations of individuals or the bounds of the FIPPs. Yet many of these projects could involve surprising applications or uses of data, exceeding user expectations, and offering notice and obtaining consent could may not be feasible.

This article expands on ideas and suggestions put forward around the recent discussion draft of the White House Consumer Privacy Bill of Rights, which espouses “Privacy Review Boards” as a safety value for noncontextual data uses. It explores how existing institutional review boards within the academy and for human testing research could offer lessons for guiding principles, providing accountability and enhancing consumer trust, and offers suggestions for how companies — and researchers — can pursue both knowledge and data innovation responsibly and ethically….(More)”

Local open data ecosystems – a prototype map


Ed Parkes and Gail Dawes at Nesta: “It is increasingly recognised that some of the most important open data is published by local authorities (LAs) – data which is important to us like bin collection days, planning applications and even where your local public toilet is. Also given the likely move towards greater decentralisation, firstly through devolution to cities, the importance of the publication of local open data could arguably become more important over the next couple of years. In addition, as of 1st April, there is a new transparency code for local government requiring local authorities to publish further information on things like spending to local land assets. To pre-empt this likely renewed focus on local open data we have begun to develop a prototype map to highlight the UK’s local open data ecosystem.

Already there is some great practice in the publication of open data at a local level – such as Leeds Data Mill, London Datastore, and Open Data Sheffield. This regional activity is also characterised not just by high quality data publication, but also by pulling together through hackdays, challenges and meetups a community interested in the power of open data. This creates an ecosystem of publishers and re-users at a local level. Some of the best practice in relation to developing such an ecosystem was recognised by the last government in the announcement of a group of Local Authority Open Data Champions. Some of these were also recipients of the funding for projects from both the Cabinet Office and through the Open Data User Group.

Outside of this best practice it isn’t always easy to understand how developed smaller, less urban open data agendas are. Other than looking at each councils’ website or increasingly on the data portals that forwarding thinking councils are providing, there is a surprisingly large number of places that local authorities could make their open data available. The most well known of these is the Openly Local project but at the time of writing this now seems to be retired. Perhaps the best catalogue of local authority data is on Data.gov.uk itself. This has 1,449 datasets published by LAs across 200 different organisations. Following that there is the Open Data Communities website which hosts links to LA linked datasets. Using data from the latter, Steve Peters has developed the local data dashboard (which was itself based on the UK Local Government Open Data resource map from Owen Boswarva). In addition, local authorities can also register their open data in the LGA’s Open Data Inventory Service and take it through the ODI’s data certification process.

Prototype map of local open data eco-systems

To try to highlight patterns in local authority open data publication we decided to make a map of activity around the country (although in the first instance we’ve focused on England)….(More)