How Data Mining Facebook Messages Can Reveal Substance Abusers


Emerging Technology from the arXiv: “…Substance abuse is a serious concern. Around one in 10 Americans are sufferers. Which is why it costs the American economy more than $700 billion a year in lost productivity, crime, and health-care costs. So a better way to identify people suffering from the disorder, and those at risk of succumbing to it, would be hugely useful.

Bickel and co say they have developed just such a technique, which allows them to spot sufferers simply by looking at their social media messages such as Facebook posts. The technique even provides new insights into the way abuse of different substances influences people’s social media messages.

The new technique comes from the analysis of data collected between 2007 and 2012 as part of a project that ran on Facebook called myPersonality. Users who signed up were offered various psychometric tests and given feedback on their scores. Many also agreed to allow the data to be used for research purposes.

One of these tests asked over 13,000 users with an average age of 23 about the substances they used. In particular, it asked how often they used tobacco, alcohol, or other drugs, and assessed each participant’s level of use. The users were then divided into groups according to their level of substance abuse.

This data set is important because it acts as a kind of ground truth, recording the exact level of substance use for each person.

The team next gathered two other Facebook-related data sets. The first was 22 million status updates posted by more than 150,000 Facebook users. The other was even larger: the “like” data associated with 11 million Facebook users.

Finally, the team worked out how these data sets overlapped. They found almost 1,000 users who were in all the data sets, just over 1,000 who were in the substance abuse and status update data sets, and 3,500 who were in the substance abuse and likes data sets.

These users with overlapping data sets provide rich pickings for data miners. If people with substance use disorders have certain unique patterns of behavior, it may be possible to spot these in their Facebook status updates or in their patterns of likes.

So Bickel and co got to work first by text mining most of the Facebook status updates and then data mining most of the likes data set. Any patterns they found, they then tested by looking for people with similar patterns in the remaining data and seeing if they also had the same level of substance use.

The results make for interesting reading. The team says its technique was hugely successful. “Our best models achieved 86%  for predicting tobacco use, 81% for alcohol use and 84% for drug use, all of which significantly outperformed existing methods,” say Bickel and co…. (More) (Full Paper: arxiv.org/abs/1705.05633: Social Media-based Substance Use Prediction).