A cautionary tale about humans creating biased AI models


 at TechCrunch: “Most artificial intelligence models are built and trained by humans, and therefore have the potential to learn, perpetuate and massively scale the human trainers’ biases. This is the word of warning put forth in two illuminating articles published earlier this year by Jack Clark at Bloomberg and Kate Crawford at The New York Times.

Tl;dr: The AI field lacks diversity — even more spectacularly than most of our software industry. When an AI practitioner builds a data set on which to train his or her algorithm, it is likely that the data set will only represent one worldview: the practitioner’s. The resulting AImodel demonstrates a non-diverse “intelligence” at best, and a biased or even offensive one at worst….

So what happens when you don’t consider carefully who is annotating the data? What happens when you don’t account for the differing preferences, tendencies and biases among varying humans? We ran a fun experiment to find out….Actually, we didn’t set out to run an experiment. We just wanted to create something fun that we thought our awesome tasking community would enjoy. The idea? Give people the chance to rate puppies’ cuteness in their spare time…There was a clear gender gap — a very consistent pattern of women rating the puppies as cuter than the men did. The gap between women’s and men’s ratings was more narrow for the “less-cute” (ouch!) dogs, and wider for the cuter ones. Fascinating.

I won’t even try to unpack the societal implications of these findings, but the lesson here is this: If you’re training an artificial intelligence model — especially one that you want to be able to perform subjective tasks — there are three areas in which you must evaluate and consider demographics and diversity:

  • yourself
  • your data
  • your annotators

This was a simple example: binary gender differences explaining one subjective numeric measure of an image. Yet it was unexpected and significant. As our industry deploys incredibly complex models that are pushing to the limit chip sets, algorithms and scientists, we risk reinforcing subtle biases, powerfully and at a previously unimaginable scale. Even more pernicious, many AIs reinforce their own learning, so we need to carefully consider “supervised” (aka human) re-training over time.

Artificial intelligence promises to change all of our lives — and it already subtly guides the way we shop, date, navigate, invest and more. But to make sure that it does so for the better, all of us practitioners need to go out of our way to be inclusive. We need to remain keenly aware of what makes us all, well… human. Especially the subtle, hidden stuff….(More)”

The risks of relying on robots for fairer staff recruitment


Sarah O’Connor at the Financial Times: “Robots are not just taking people’s jobs away, they are beginning to hand them out, too. Go to any recruitment industry event and you will find the air is thick with terms like “machine learning”, “big data” and “predictive analytics”.

The argument for using these tools in recruitment is simple. Robo-recruiters can sift through thousands of job candidates far more efficiently than humans. They can also do it more fairly. Since they do not harbour conscious or unconscious human biases, they will recruit a more diverse and meritocratic workforce.

This is a seductive idea but it is also dangerous. Algorithms are not inherently neutral just because they see the world in zeros and ones.

For a start, any machine learning algorithm is only as good as the training data from which it learns. Take the PhD thesis of academic researcher Colin Lee, released to the press this year. He analysed data on the success or failure of 441,769 job applications and built a model that could predict with 70 to 80 per cent accuracy which candidates would be invited to interview. The press release plugged this algorithm as a potential tool to screen a large number of CVs while avoiding “human error and unconscious bias”.

But a model like this would absorb any human biases at work in the original recruitment decisions. For example, the research found that age was the biggest predictor of being invited to interview, with the youngest and the oldest applicants least likely to be successful. You might think it fair enough that inexperienced youngsters do badly, but the routine rejection of older candidates seems like something to investigate rather than codify and perpetuate. Mr Lee acknowledges these problems and suggests it would be better to strip the CVs of attributes such as gender, age and ethnicity before using them….(More)”

The Racist Algorithm?


Anupam Chander in the Michigan Law Review (2017 Forthcoming) : “Are we on the verge of an apartheid by algorithm? Will the age of big data lead to decisions that unfairly favor one race over others, or men over women? At the dawn of the Information Age, legal scholars are sounding warnings about the ubiquity of automated algorithms that increasingly govern our lives. In his new book, The Black Box Society: The Hidden Algorithms Behind Money and Information, Frank Pasquale forcefully argues that human beings are increasingly relying on computerized algorithms that make decisions about what information we receive, how much we can borrow, where we go for dinner, or even whom we date. Pasquale’s central claim is that these algorithms will mask invidious discrimination, undermining democracy and worsening inequality. In this review, I rebut this prominent claim. I argue that any fair assessment of algorithms must be made against their alternative. Algorithms are certainly obscure and mysterious, but often no more so than the committees or individuals they replace. The ultimate black box is the human mind. Relying on contemporary theories of unconscious discrimination, I show that the consciously racist or sexist algorithm is less likely than the consciously or unconsciously racist or sexist human decision-maker it replaces. The principal problem of algorithmic discrimination lies elsewhere, in a process I label viral discrimination: algorithms trained or operated on a world pervaded by discriminatory effects are likely to reproduce that discrimination.

I argue that the solution to this problem lies in a kind of algorithmic affirmative action. This would require training algorithms on data that includes diverse communities and continually assessing the results for disparate impacts. Instead of insisting on race or gender neutrality and blindness, this would require decision-makers to approach algorithmic design and assessment in a race and gender conscious manner….(More)

The Seductions of Quantification: Measuring Human Rights, Gender Violence, and Sex Trafficking


Book by Sally Engle Merry: “We live in a world where seemingly everything can be measured. We rely on indicators to translate social phenomena into simple, quantified terms, which in turn can be used to guide individuals, organizations, and governments in establishing policy. Yet counting things requires finding a way to make them comparable. And in the process of translating the confusion of social life into neat categories, we inevitably strip it of context and meaning—and risk hiding or distorting as much as we reveal.

With The Seductions of Quantification, leading legal anthropologist Sally Engle Merry investigates the techniques by which information is gathered and analyzed in the production of global indicators on human rights, gender violence, and sex trafficking. Although such numbers convey an aura of objective truth and scientific validity, Merry argues persuasively that measurement systems constitute a form of power by incorporating theories about social change in their design but rarely explicitly acknowledging them. For instance, the US State Department’s Trafficking in Persons Report, which ranks countries in terms of their compliance with antitrafficking activities, assumes that prosecuting traffickers as criminals is an effective corrective strategy—overlooking cultures where women and children are frequently sold by their own families. As Merry shows, indicators are indeed seductive in their promise of providing concrete knowledge about how the world works, but they are implemented most successfully when paired with context-rich qualitative accounts grounded in local knowledge….(More)”.

Transparency reports make AI decision-making accountable


Phys.org: “Machine-learning algorithms increasingly make decisions about credit, medical diagnoses, personalized recommendations, advertising and job opportunities, among other things, but exactly how usually remains a mystery. Now, new measurement methods developed by Carnegie Mellon University researchers could provide important insights to this process.

 Was it a person’s age, gender or education level that had the most influence on a decision? Was it a particular combination of factors? CMU’s Quantitative Input Influence (QII) measures can provide the relative weight of each factor in the final decision, said Anupam Datta, associate professor of computer science and electrical and computer engineering.

“Demands for algorithmic transparency are increasing as the use of algorithmic decision-making systems grows and as people realize the potential of these systems to introduce or perpetuate racial or sex discrimination or other social harms,” Datta said.

“Some companies are already beginning to provide transparency reports, but work on the computational foundations for these reports has been limited,” he continued. “Our goal was to develop measures of the degree of influence of each factor considered by a system, which could be used to generate transparency reports.”

These reports might be generated in response to a particular incident—why an individual’s loan application was rejected, or why police targeted an individual for scrutiny or what prompted a particular medical diagnosis or treatment. Or they might be used proactively by an organization to see if an artificial intelligence system is working as desired, or by a regulatory agency to see whether a decision-making system inappropriately discriminated between groups of people….(More)”

City planners tap into wealth of cycling data from Strava tracking app


Peter Walker in The Guardian: “Sheila Lyons recalls the way Oregon used to collect data on how many people rode bikes. “It was very haphazard, two-hour counts done once a year,” said the woman in charge of cycling policy for the state government.“Volunteers, sitting on the street corner because they wanted better bike facilities. Pathetic, really.”

But in 2013 a colleague had an idea. She recorded her own bike rides using an app called Strava, and thought: why not ask the company to share its data? And so was born Strava Metro, both an inadvertent tech business spinoff and a similarly accidental urban planning tool, one that is now quietly helping to reshape streets in more than 70 places around the world and counting.

Using the GPS tracking capability of a smartphone and similar devices, Strata allows people to plot how far and fast they go and compare themselves against other riders. Users create designated route segments, which each have leaderboards ranked by speed.

Originally aimed just at cyclists, Strava soon incorporated running and now has options for more than two dozen pursuits. But cycling remains the most popular,and while the company is coy about overall figures, it says it adds 1 million new members every two months, and has more than six million uploads a week.

For city planners like Lyons, used to very occasional single-street bike counts,this is a near-unimaginable wealth of data. While individual details are anonymised, it still shows how many Strava-using cyclists, plus their age and gender, ride down any street at any time of the day, and the entire route they take.

The company says it initially had no idea how useful the information could be,and only began visualising data on heatmaps as a fun project for its engineers.“We’re not city planners,” said Michael Horvath, one of two former HarvardUniversity rowers and relatively veteran 40-something tech entrepreneurs who co-founded Strava in 2009.

“One of the things that we learned early on is that these people just don’t have very much data to begin with. Not only is ours a novel dataset, in many cases it’s the only dataset that speaks to the behaviour of cyclists and pedestrians in that city or region.”…(More)”

Crowdsourcing corruption in India’s maternal health services


Joan Okitoi-Heisig at DW Akademie: “…The Mera Swasthya Meri Aawaz (MSMA) project is the first of its kind in India to track illicit maternal fees demanded in government hospitals located in the northern state of Uttar Pradesh.

MSMA (“My Health, My Voice”) is part of SAHAYOG, a non-governmental umbrella organization that helped launch the project. MSMA uses an Ushahidi platform to map and collect data on unofficial fees that plague India’ ostensibly “free” maternal health services. It is one of the many projects showcased in DW Akademie’s recently launched Digital Innovation Library. SAHAYOG works closely with grassroots organizations to promote gender equality and women’s health issues from a human rights perspective…

SAYAHOG sees women’s maternal health as a human rights issue. Key to the MSMA project is exposing government facilities that extort bribes from among the poorest and most vulnerable in society.

Sandhya and her colleagues are convinced that promoting transparency and accountability through the data collected can empower the women. If they’re aware of their entitlements, she says, they can demand their rights and in the process hold leaders accountable.

“Information is power,” Sandhya explains. Without this information, she says, “they aren’t in a position to demand what is rightly theirs.”

Health care providers hold a certain degree of power when entrusted with taking care of expectant mothers. Many give into bribes for fear of being otherwise neglected or abused.

With the MSMA project, however, poor rural women have technology that is easy to use and accessible on their mobile phones, and that empowers them to make complaints and report bribes for services that are supposed to be free.

MSMA is an innovative data-driven platform that combines a toll free number, an interactive voice response system (IVRS) and a website that contains accessible reports. In addition to enabling poor women to air their frustrations anonymously, the project aggregates actionable data which can then be used by the NGO as well as the government to work towards improving the situation for mothers in India….(More)”

Global governance and ICTs: exploring online governance networks around gender and media


Claudia Padovani and Elena Pavan in the journal “Global Networks“: In this article, we address transformations in global governance brought about by information and communication technologies (ICTs). Focusing on the specific domain of ‘gender-oriented communication governance’, we investigate online interactions among different kinds of actors active in promoting gender equity in and through the media. By tracing and analysing online issue networks, we investigate which actors are capable of influencing the framing of issues and of structuring discursive practices. From the analysis, different forms of power emerge, reflecting diverse modes of engaging in online interactions, where actors can operate as network ‘programmers’, ‘mobilizers’, or ‘switchers’. Our case study suggests that, often, old ways of conceiving actors’ interactions accompany the implementation of new communication tools, while the availability of a pervasive networked infrastructure does not automatically translate into meaningful interactions among all relevant actors in a specific domain….(More)”

Opening up census data for research


Economic and Social Research Council (UK): “InFuse, an online search facility for census data, is enabling tailored search and investigation of UK census statistics – opening new opportunities for aggregating and comparing population counts.

Impacts

  • InFuse data were used for the ‘Smarter Travel’ research project studying how ‘smart choices’ for sustainable travel could be implemented and supported in transport planning. The research directly influenced UK climate-change agendas and policy, including:
    • the UK Committee on Climate Change recommendations on cost-effective-emission reductions
    • the Scottish Government’s targets and household advice for smarter travel
    • the UK Government’s Local Sustainable Transport Fund supporting 96 projects across England
    • evaluations for numerous Local Authority Transport Plans across the UK.
  • The Integration Hub, a web resource that was launched by Demos in 2015 to provide data about ethnic integration in England and Wales, uses data from InFuse to populate its interactive maps of the UK.
  • Census data downloaded from InFuse informed the Welsh Government for policies to engage Gypsy and Traveller families in education, showing that over 60 per cent aged over 16 from these communities had no qualifications.
  • Executive recruitment firm Sapphire Partners used census data from InFuse in a report on female representation on boards, revealing that 77 per cent of FTSE board members are men, and 70 per cent of new board appointments go to men.
  • A study by the Marie Curie charity into the differing needs of Black, Asian and minority ethnic groups in Scotland for end-of-life care used InFuse to determine that the minority ethnic population in Scotland has doubled since 2001 from 100,000 to 200,000 – highlighting the need for greater and more appropriate provision.
  • A Knowledge Transfer Partnership between homelessness charity Llamau and Cardiff University used InFuse data to show that Welsh young homeless people participating in the study were over twice as likely to have left school with no qualifications compared to UK-wide figures for their age group and gender….(More)”

 

Website Seeks to Make Government Data Easier to Sift Through


Steve Lohr at the New York Times: “For years, the federal government, states and some cities have enthusiastically made vast troves of data open to the public. Acres of paper records on demographics, public health, traffic patterns, energy consumption, family incomes and many other topics have been digitized and posted on the web.

This abundance of data can be a gold mine for discovery and insights, but finding the nuggets can be arduous, requiring special skills.

A project coming out of the M.I.T. Media Lab on Monday seeks to ease that challenge and to make the value of government data available to a wider audience. The project, called Data USA, bills itself as “the most comprehensive visualization of U.S. public data.” It is free, and its software code is open source, meaning that developers can build custom applications by adding other data.

Cesar A. Hidalgo, an assistant professor of media arts and sciences at the M.I.T. Media Lab who led the development of Data USA, said the website was devised to “transform data into stories.” Those stories are typically presented as graphics, charts and written summaries….Type “New York” into the Data USA search box, and a drop-down menu presents choices — the city, the metropolitan area, the state and other options. Select the city, and the page displays an aerial shot of Manhattan with three basic statistics: population (8.49 million), median household income ($52,996) and median age (35.8).

Lower on the page are six icons for related subject categories, including economy, demographics and education. If you click on demographics, one of the so-called data stories appears, based largely on data from the American Community Survey of the United States Census Bureau.

Using colorful graphics and short sentences, it shows the median age of foreign-born residents of New York (44.7) and of residents born in the United States (28.6); the most common countries of origin for immigrants (the Dominican Republic, China and Mexico); and the percentage of residents who are American citizens (82.8 percent, compared with a national average of 93 percent).

Data USA features a selection of data results on its home page. They include the gender wage gap in Connecticut; the racial breakdown of poverty in Flint, Mich.; the wages of physicians and surgeons across the United States; and the institutions that award the most computer science degrees….(More)