Researcher uncovers inherent biases of big data collected from social media sites


Phys.org: “With every click, Facebook, Twitter and other social media users leave behind digital traces of themselves, information that can be used by businesses, government agencies and other groups that rely on “big data.”

But while the information derived from social network sites can shed light on social behavioral traits, some analyses based on this type of data collection are prone to bias from the get-go, according to new research by Northwestern University professor Eszter Hargittai, who heads the Web Use Project.

Since people don’t randomly join Facebook, Twitter or LinkedIn—they deliberately choose to engage —the data are potentially biased in terms of demographics, socioeconomic background or Internet skills, according to the research. This has implications for businesses, municipalities and other groups who use because it excludes certain segments of the population and could lead to unwarranted or faulty conclusions, Hargittai said.

The study, “Is Bigger Always Better? Potential Biases of Big Data Derived from Social Network Sites” was published last month in the journal The Annals of the American Academy of Political and Social Science and is part of a larger, ongoing study.

The buzzword “big data” refers to automatically generated information about people’s behavior. It’s called “big” because it can easily include millions of observations if not more. In contrast to surveys, which require explicit responses to questions, big data is created when people do things using a service or system.

“The problem is that the only people whose behaviors and opinions are represented are those who decided to join the site in the first place,” said Hargittai, the April McClain-Delaney and John Delaney Professor in the School of Communication. “If people are analyzing big data to answer certain questions, they may be leaving out entire groups of people and their voices.”

For example, a city could use Twitter to collect local opinion regarding how to make the community more “age-friendly” or whether more bike lanes are needed. In those cases, “it’s really important to know that people aren’t on Twitter randomly, and you would only get a certain type of person’s response to the question,” said Hargittai.

“You could be missing half the population, if not more. The same holds true for companies who only use Twitter and Facebook and are looking for feedback about their products,” she said. “It really has implications for every kind of group.”…

More information: “Is Bigger Always Better? Potential Biases of Big Data Derived from Social Network Sites” The Annals of the American Academy of Political and Social Science May 2015 659: 63-76, DOI: 10.1177/0002716215570866