Using big data to predict suicide risk among Canadian youth

SAS Insights “Suicide is the second leading cause of death among youth in Canada, according to Statistics Canada, accounting for one-fifth of deaths of people under the age of 25 in 2011. The Canadian Mental Health Association states that among 15 – 24 year olds the number is an even more frightening at 24 percent – the third highest in the industrialized world. Yet despite these disturbing statistics, the signals that an individual plans on self-injury or suicide are hard to isolate….

Team members …collected 2.3 million tweets and used text mining software to identify 1.1 million of them as likely to have been authored by 13 to 17 year olds in Canada by building a machine learning model to predict age, based on the open source PAN author profiling dataset. Their analysis made use of natural language processing, predictive modelling, text mining, and data visualization….

However, there were challenges. Ages are not revealed on Twitter, so the team had to figure out how to tease out the data for 13 – 17 year olds in Canada. “We had a text data set, and we created a model to identify if people were in that age group based on how they talked in their tweets,” Soehl said. “From there, we picked some specific buzzwords and created topics around them, and our software mined those tweets to collect the people.”

Another issue was the restrictions Twitter places on pulling data, though Soehl believes that once this analysis becomes an established solution, Twitter may work with researchers to expedite the process. “Now that we’ve shown it’s possible, there are a lot of places we can go with it,” said Soehl. “Once you know your path and figure out what’s going to be valuable, things come together quickly.”

The team looked at the percentage of people in the group who were talking about depression or suicide, and what they were talking about. Horne said that when SAS’ work went in front of a Canadian audience working in health care, they said that it definitely filled a gap in their data — and that was the validation he’d been looking for. The team also won $10,000 for creating the best answer to this question (the team donated the award money to two mental health charities: Mind Your Mind and Rise Asset Development)

What’s next?

That doesn’t mean the work is done, said Jos Polfliet. “We’re just scraping the surface of what can be done with the information.” Another way to use the results is to look at patterns and trends….(More)”