A 630-Billion-Word Internet Analysis Shows ‘People’ Is Interpreted as ‘Men’


Dana G. Smith at Scientific American: “A massive linguistic analysis of more than half a trillion words concludes that we assign gender to words that, by their very definition, should be gender-neutral.

Psychologists at New York University analyzed text from nearly three billion Web pages and compared how often words for person (“individual,” “people,” and so on) were associated with terms for a man (“male,” “he”) or a woman (“female,” “she”). They found that male-related words overlapped with “person” more frequently than female words did. The cultural concept of a person, from this perspective, is more often a man than a woman, according to the study, which was published on April 1 in Science Advances.

To conduct the study, the researchers turned to an enormous open-source data set of Web pages called the Common Crawl, which pulls text from everything from corporate white papers to Internet discussion forums. For their analysis of the text—a total of more than 630 billion words—the researchers used word embeddings, a computational linguistic technique that assesses how similar two words are by looking for how often they appear together.

“You can take a word like the word ‘person’ and understand what we mean by ‘person,’ how we represent the word ‘person,’ by looking at the other words that we often use around the word ‘person,’” explains April Bailey, a postdoctoral researcher at N.Y.U., who conducted the study. “We found that there was more overlap between the words for people and words for men than words for people and the words for women…, suggesting that there is this male bias in the concept of a person.”

Scientists have previously studied gender bias in language, such as the idea that women are more closely associated with family and home life and that men are more closely linked with work. “But this is the first to study this really general gender stereotype—the idea that men are sort of the default humans—in this quantitative computational social science way,” says Molly Lewis, a research scientist at the psychology department at Carnegie Mellon University, who was not involved in the study….(More)”.