Mirroring the real world in social media: twitter, geolocation, and sentiment analysis


Paper by E Baucom, A Sanjari, X Liu, and M Chen as part of the proceedings of UnstructureNLP ’13: “In recent years social media has been used to characterize and predict real world events, and in this research we seek to investigate how closely Twitter mirrors the real world. Specifically, we wish to characterize the relationship between the language used on Twitter and the results of the 2011 NBA Playoff games. We hypothesize that the language used by Twitter users will be useful in classifying the users’ locations combined with the current status of which team is in the lead during the game. This is based on the common assumption that “fans” of a team have more positive sentiment and will accordingly use different language when their team is doing well. We investigate this hypothesis by labeling each tweet according the the location of the user along with the team that is in the lead at the time of the tweet. The hypothesized difference in language (as measured by tfidf) should then have predictive power over the tweet labels. We find that indeed it does and we experiment further by adding semantic orientation (SO) information as part of the feature set. The SO does not offer much improvement over tf-idf alone. We discuss the relative strengths of the two types of features for our data.”