Gender Biases in Cyberspace: A Two-Stage Model, the New Arena of Wikipedia and Other Websites


Paper by Shlomit Yanisky-Ravid and Amy Mittelman: “Increasingly, there has been a focus on creating democratic standards and norms in order to best facilitate open exchange of information and communication online―a goal that fits neatly within the feminist aim to democratize content creation and community. Collaborative websites, such as blogs, social networks, and, as focused on in this Article, Wikipedia, represent both a cyberspace community entirely outside the strictures of the traditional (intellectual) proprietary paradigm and one that professes to truly embody the philosophy of a completely open, free, and democratic resource for all. In theory, collaborative websites are the solution for which social activists, intellectual property opponents, and feminist theorists have been waiting. Unfortunately, we are now realizing that this utopian dream does not exist as anticipated: the Internet is neither neutral nor open to everyone. More importantly, these websites are not egalitarian; rather, they facilitate new ways to exclude and subordinate women. This Article innovatively argues that the virtual world excludes women in two stages: first, by controlling websites and filtering out women; and second, by exposing women who survived the first stage to a hostile environment. Wikipedia, as well as other cyber-space environments, demonstrates the execution of the model, which results in the exclusion of women from the virtual sphere with all the implications thereof….(More)”.

How did awful panel discussions become the default format?


 at The Guardian: “With the occasional exception, my mood in conferences usually swings between boredom, despair and rage. The turgid/self-aggrandizing keynotes and coma-inducing panels, followed by people (usually men) asking ‘questions’ that are really comments, and usually not on topic. The chairs who abdicate responsibility and let all the speakers over-run, so that the only genuinely productive bit of the day (networking at coffee breaks and lunch) gets squeezed. I end up dozing off, or furiously scribbling abuse in my notebook as a form of therapy, and hoping my neighbours can’t see what I’m writing. I probably look a bit unhinged…

This matters both because of the lost opportunity that badly run conferences represent, and because they cost money and time. I hope that if it was easy to fix, people would have done so already, but the fact is that the format is tired and unproductive.

For example, how did something as truly awful as panel discussions become the default format? They end up being a parade of people reading out papers, or they include terrible powerpoints crammed with too many words and illegible graphics. Can we try other formats, like speed dating (eg 10 people pitch their work for 2 minutes each, then each goes to a table and the audience hooks up (intellectually, I mean) with the ones they were interested in); world cafes; simulation games; joint tasks (eg come up with an infographic that explains X)? Anything, really. Yes ‘manels’ (male only panels – take the pledge here) are an outrage, but why not go for complete abolition, rather than mere gender balance?

Conferences frequently discuss evidence and results. So where is the evidence and results for the efficacy of conferences? Given the resources being ploughed into research on development (DFID alone spends about £350m a year), surely it would be a worthwhile investment, if it hasn’t already been done, to sponsor a research programme that runs multiple parallel experiments with different event formats, and compares the results in terms of participant feedback, how much people retain a month after the event etc? At the very least, can they find or commission a systematic review on what the existing evidence says?

Feedback systems could really help. A public eBay-type ratings system to rank speakers/conferences would provide nice examples of good practice for people to draw on (and bad practice to avoid). Or why not go real-time and encourage instant audience feedback? OK, maybe Occupy-style thumbs up from the audience if they like the speaker, thumbs down if they don’t would be a bit in-your-face for academe, but why not introduce a twitterwall to encourage the audience to interact with the speaker (perhaps with moderation to stop people testing the limits, as my LSE students did to Owen Barder last term)?

We need to get better at shaping the format to fit the the precise purpose of the conference. … if the best you can manage is ‘disseminating new research’ of ‘information sharing’, alarm bells should probably ring….(More)”.

Everybody Lies: Big Data, New Data, and What the Internet Can Tell Us About Who We Really Are


Book by Seth Stephens-Davidowitz: “Blending the informed analysis of The Signal and the Noise with the instructive iconoclasm of Think Like a Freak, a fascinating, illuminating, and witty look at what the vast amounts of information now instantly available to us reveals about ourselves and our world—provided we ask the right questions.

By the end of an average day in the early twenty-first century, human beings searching the internet will amass eight trillion gigabytes of data. This staggering amount of information—unprecedented in history—can tell us a great deal about who we are—the fears, desires, and behaviors that drive us, and the conscious and unconscious decisions we make. From the profound to the mundane, we can gain astonishing knowledge about the human psyche that less than twenty years ago, seemed unfathomable.

Everybody Lies offers fascinating, surprising, and sometimes laugh-out-loud insights into everything from economics to ethics to sports to race to sex, gender and more, all drawn from the world of big data. What percentage of white voters didn’t vote for Barack Obama because he’s black? Does where you go to school effect how successful you are in life? Do parents secretly favor boy children over girls? Do violent films affect the crime rate? Can you beat the stock market? How regularly do we lie about our sex lives and who’s more self-conscious about sex, men or women?

Investigating these questions and a host of others, Seth Stephens-Davidowitz offers revelations that can help us understand ourselves and our lives better. Drawing on studies and experiments on how we really live and think, he demonstrates in fascinating and often funny ways the extent to which all the world is indeed a lab. With conclusions ranging from strange-but-true to thought-provoking to disturbing, he explores the power of this digital truth serum and its deeper potential—revealing biases deeply embedded within us, information we can use to change our culture, and the questions we’re afraid to ask that might be essential to our health—both emotional and physical. All of us are touched by big data everyday, and its influence is multiplying. Everybody Lies challenges us to think differently about how we see it and the world…(More)”.

We use big data to sentence criminals. But can the algorithms really tell us what we need to know?


 at the Conversation: “In 2013, a man named Eric L. Loomis was sentenced for eluding police and driving a car without the owner’s consent.

When the judge weighed Loomis’ sentence, he considered an array of evidence, including the results of an automated risk assessment tool called COMPAS. Loomis’ COMPAS score indicated he was at a “high risk” of committing new crimes. Considering this prediction, the judge sentenced him to seven years.

Loomis challenged his sentence, arguing it was unfair to use the data-driven score against him. The U.S. Supreme Court now must consider whether to hear his case – and perhaps settle a nationwide debate over whether it’s appropriate for any court to use these tools when sentencing criminals.

Today, judges across the U.S. use risk assessment tools like COMPAS in sentencing decisions. In at least 10 states, these tools are a formal part of the sentencing process. Elsewhere, judges informally refer to them for guidance.

I have studied the legal and scientific bases for risk assessments. The more I investigate the tools, the more my caution about them grows.

The scientific reality is that these risk assessment tools cannot do what advocates claim. The algorithms cannot actually make predictions about future risk for the individual defendants being sentenced….

Algorithms such as COMPAS cannot make predictions about individual defendants, because data-driven risk tools are based on group statistics. This creates an issue that academics sometimes call the “group-to-individual” or G2i problem.

Scientists study groups. But the law sentences the individual. Consider the disconnect between science and the law here.

The algorithms in risk assessment tools commonly assign specific points to different factors. The points are totaled. The total is then often translated to a risk bin, such as low or high risk. Typically, more points means a higher risk of recidivism.

Say a score of 6 points out of 10 on a certain tool is considered “high risk.” In the historical groups studied, perhaps 50 percent of people with a score of 6 points did reoffend.

Thus, one might be inclined to think that a new offender who also scores 6 points is at a 50 percent risk of reoffending. But that would be incorrect.

It may be the case that half of those with a score of 6 in the historical groups studied would later reoffend. However, the tool is unable to select which of the offenders with 6 points will reoffend and which will go on to lead productive lives.

The studies of factors associated with reoffending are not causation studies. They can tell only which factors are correlated with new crimes. Individuals retain some measure of free will to decide to break the law again, or not.

These issues may explain why risk tools often have significant false positive rates. The predictions made by the most popular risk tools for violence and sex offending have been shown to get it wrong for some groups over 50 percent of the time.

A ProPublica investigation found that COMPAS, the tool used in Loomis’ case, is burdened by large error rates. For example, COMPAS failed to predict reoffending in one study at a 37 percent rate. The company that makes COMPAS has disputed the study’s methodology….

There are also a host of thorny issues with risk assessment tools incorporating, either directly or indirectly, sociodemographic variables, such as gender, race and social class. Law professor Anupam Chander has named it the problem of the “racist algorithm.”

Big data may have its allure. But, data-driven tools cannot make the individual predictions that sentencing decisions require. The Supreme Court might helpfully opine on these legal and scientific issues by deciding to hear the Loomis case…(More)”.

Why big-data analysis of police activity is inherently biased


 and  in The Conversation: “In early 2017, Chicago Mayor Rahm Emanuel announced a new initiative in the city’s ongoing battle with violent crime. The most common solutions to this sort of problem involve hiring more police officers or working more closely with community members. But Emanuel declared that the Chicago Police Department would expand its use of software, enabling what is called “predictive policing,” particularly in neighborhoods on the city’s south side.

The Chicago police will use data and computer analysis to identify neighborhoods that are more likely to experience violent crime, assigning additional police patrols in those areas. In addition, the software will identify individual people who are expected to become – but have yet to be – victims or perpetrators of violent crimes. Officers may even be assigned to visit those people to warn them against committing a violent crime.

Any attempt to curb the alarming rate of homicides in Chicago is laudable. But the city’s new effort seems to ignore evidence, including recent research from members of our policing study team at the Human Rights Data Analysis Group, that predictive policing tools reinforce, rather than reimagine, existing police practices. Their expanded use could lead to further targeting of communities or people of color.

Working with available data

At its core, any predictive model or algorithm is a combination of data and a statistical process that seeks to identify patterns in the numbers. This can include looking at police data in hopes of learning about crime trends or recidivism. But a useful outcome depends not only on good mathematical analysis: It also needs good data. That’s where predictive policing often falls short.

Machine-learning algorithms learn to make predictions by analyzing patterns in an initial training data set and then look for similar patterns in new data as they come in. If they learn the wrong signals from the data, the subsequent analysis will be lacking.

This happened with a Google initiative called “Flu Trends,” which was launched in 2008 in hopes of using information about people’s online searches to spot disease outbreaks. Google’s systems would monitor users’ searches and identify locations where many people were researching various flu symptoms. In those places, the program would alert public health authorities that more people were about to come down with the flu.

But the project failed to account for the potential for periodic changes in Google’s own search algorithm. In an early 2012 update, Google modified its search tool to suggest a diagnosis when users searched for terms like “cough” or “fever.” On its own, this change increased the number of searches for flu-related terms. But Google Flu Trends interpreted the data as predicting a flu outbreak twice as big as federal public health officials expected and far larger than what actually happened.

Criminal justice data are biased

The failure of the Google Flu Trends system was a result of one kind of flawed data – information biased by factors other than what was being measured. It’s much harder to identify bias in criminal justice prediction models. In part, this is because police data aren’t collected uniformly, and in part it’s because what data police track reflect longstanding institutional biases along income, race and gender lines….(More)”.

Using big data to understand consumer behaviour on ethical issues


Phani Kumar Chintakayala  and C. William Young in the Journal of Consumer Ethics: “The Consumer Data Research Centre (CDRC) was established by the UK Economic and Social Research Council and launched its data services in 2015. Te project is led by the University of Leeds and UCL, with partners at the Universities of Liverpool and Oxford. It is working with consumer-related organisations and businesses to open up their data resources to trusted researchers, enabling them to carry out important social and economic research….

Over the last few years there has been much talk about how so-called “big data” is the future and if you are not exploiting it, you are losing your competitive advantage. So what is there in the latest wave of enthusiasm on big data to help organisations, researchers and ethical consumers?…

Examples of the types of research being piloted using data from the food sector by CDRC include the consumption of milk and egg products. Te results clearly indicate that not all the sustainable  products are considered the same by consumers, and consumption behaviour varies across sustainable product categories. i) A linked data analysis was carried out by combining sales data of organic milk and free range eggs from a retailer with over 300 stores across the UK, green and ethical atitude data from CDRC’s data partner, and socio-demographic and deprivation data from open sources. Te analysis revealed that, in general, the consumers with deeper green and ethical atitudes are the most likely consumers of sustainable products. Deprivation has a negative efect on the consumption of sustainable products. Price, as expected, has a negative efect but the impact varies across products. Convenience stores have signifcant negative efect on the consumption of sustainable products. Te infuences of socio-demographic characteristics such as gender, age, ethnicity etc. seem to vary by product categories….

Big data can help organisations, researchers and ethical consumers understand the ethics around consumer behaviour and products. Te opportunities to link diferent types of data is exciting but must be research-question-led to avoid digging for non-existent causal links. Te methods and access to data is still a barrier but open access is key to solving this. Big data will probably only help in flling in the details of our knowledge on ethical consumption and on products, but this can only help our decision making…(More)”.

A How-to Book for Wielding Civic Power


Interview by David Bornstein at the New York Times: “Last year, the RAND Survey Research Group asked 3,037 Americans about their political preferences and found that the factor that best predicted support for Donald Trump wasn’t age, race, gender, income, educational attainment or attitudes toward Muslims or undocumented immigrants. It was whether respondents agreed with the statement “People like me don’t have any say about what the government does.”

A feeling of disenfranchisement, or powerlessness, runs deep in the country — and it’s understandable. For most Americans, wages have been flat for 40 years, while incomes have soared for the superrich. Researchers have found, unsurprisingly, that the preferences of wealthy people have a much bigger influence on policy than those of poor or middle-income people.

“I don’t think people are wrong to feel that the game has been rigged,” says Eric Liu, the author of “You’re More Powerful Than You Think: A Citizen’s Guide to Making Change Happen,” an engaging and extremely timely book published last week. “But we’re in a period where across the political spectrum — from the libertarian Tea Party right to the Occupy and Black Lives Matter left — people are pushing back and recognizing that the only remedy is to convert this feeling of ‘not having a say’ into ‘demanding a say.’ ”

Liu, who founded Citizen University, a nonprofit citizen participation organization in Seattle, teaches citizens to do just that. He has also traveled the country, searching across the partisan divide for places where citizens are making democracy work better. In his new book, he has assembled stories of citizen action and distilled them into powerful insights and strategies….

Can you explain the three “core laws of power” you outline in the book?

L. No. 1: Power compounds, as does powerlessness. The rich get richer, and people with clout get more clout.

No. 2: Power justifies itself. In a hundred different ways — propaganda, conventional wisdom, just-so stories — people at the top of the hierarchy tell narratives about why it should be so.

If the world stopped with laws No. 1 and 2, we would be stuck in this doom loop that would tip us toward monopoly and tyranny.

What saves us is law No. 3: Power is infinite. I don’t mean we are all equally powerful. I mean simply and quite literally that we can generate power out of thin air. We do that by organizing….(More)”

Big Data and the Well-Being of Women and Girls: Applications on the Social Scientific Frontier


Report by Bapu Vaitla et al for Data2X: “Conventional forms of data—household surveys, national economic accounts, institutional records, and so on—struggle to capture detailed information on the lives of women and girls. The many forms of big data, from geospatial information to digital transaction logs to records of internet activity, can help close the global gender data gap. This report profiles several big data projects that quantify the economic, social, and health status of women and girls…

This report illustrates the potential of big data in filling the global gender data gap. The rise of big data, however, does not mean that traditional sources of data will become less important. On the contrary, the successful implementation of big data approaches requires investment in proven methods of social scientific research, especially for validation and bias correction of big datasets. More broadly, the invisibility of women and girls in national and international data systems is a political, not solely a technical, problem. In the best case, the current “data revolution” will be reimagined as a step towards better “data governance”: a process through which novel types of information catalyze the creation of new partnerships to advocate for scientific, policy, and political reforms that include women and girls in all spheres of social and economic life….(More)”.

Did artificial intelligence deny you credit?


 in The Conversation: “People who apply for a loan from a bank or credit card company, and are turned down, are owed an explanation of why that happened. It’s a good idea – because it can help teach people how to repair their damaged credit – and it’s a federal law, the Equal Credit Opportunity Act. Getting an answer wasn’t much of a problem in years past, when humans made those decisions. But today, as artificial intelligence systems increasingly assist or replace people making credit decisions, getting those explanations has become much more difficult.

Traditionally, a loan officer who rejected an application could tell a would-be borrower there was a problem with their income level, or employment history, or whatever the issue was. But computerized systems that use complex machine learning models are difficult to explain, even for experts.

Consumer credit decisions are just one way this problem arises. Similar concerns exist in health care, online marketing and even criminal justice. My own interest in this area began when a research group I was part of discovered gender bias in how online ads were targeted, but could not explain why it happened.

All those industries, and many others, who use machine learning to analyze processes and make decisions have a little over a year to get a lot better at explaining how their systems work. In May 2018, the new European Union General Data Protection Regulation takes effect, including a section giving people a right to get an explanation for automated decisions that affect their lives. What shape should these explanations take, and can we actually provide them?

Identifying key reasons

One way to describe why an automated decision came out the way it did is to identify the factors that were most influential in the decision. How much of a credit denial decision was because the applicant didn’t make enough money, or because he had failed to repay loans in the past?

My research group at Carnegie Mellon University, including PhD student Shayak Sen and then-postdoc Yair Zick created a way to measure the relative influence of each factor. We call it the Quantitative Input Influence.

In addition to giving better understanding of an individual decision, the measurement can also shed light on a group of decisions: Did an algorithm deny credit primarily because of financial concerns, such as how much an applicant already owes on other debts? Or was the applicant’s ZIP code more important – suggesting more basic demographics such as race might have come into play?…(More)”

Unconscious gender bias in the Google algorithm


Interview in Metode with Londa Schiebinger, director of Gendered Innovations: “We were interested, because the methods of sex and gender analysis are not in the university curriculum, yet it is very important. The first thing our group did was to develop those methods and we present twelve methods on the website. We knew it would be very important to create case studies or concrete examples where sex and gender analysis added something new to the research. One of my favorite examples is machine translation. If you look at Google Translate, which is the main one in the United States – SYSTRAN is the main one in Europe – we found that it defaults the masculine pronoun. So does SYSTRAN. If I put an article about myself into Google Translate, it defaults to «he said» instead of «she said». So, in an article of one of my visits to Spain, it defaults to «he thinks, he says…» and, occasionally, «it wrote». We wondered why this happened and we found out, because Google Translate works on an algorithm, the problem is that «he said» appears on the web four times more than «she said», so the machine gets it right if it chooses «he said». Because the algorithm is just set up for that. But, anyway, we found that there was a huge change in English language from 1968 to the current time, and the proportion of «he said» and «she said» changed from 4-to-1 to 2-to-1. But, still, the translation does not take this into account. So we went to Google and we said «Hey, what is going on?» and they said «Oh, wow, we didn’t know, we had no idea!». So what we recognized is that there is an unconscious gender bias in the Google algorithm. They did not intend to do this at all, so now there are a lot of people who are trying to fix it….

How can you fix that?

Oh, well, this is the thing! …I think algorithms in general are a problem because if there is any kind of unconscious bias in the data, the algorithm just returns that to you. So even though Google has policies, company policies, to support gender equality, they had an unconscious bias in their product and they do not mean to. Now that they know about it, they can try to fix it….(More)”