How Data Mining Facebook Messages Can Reveal Substance Abusers


Emerging Technology from the arXiv: “…Substance abuse is a serious concern. Around one in 10 Americans are sufferers. Which is why it costs the American economy more than $700 billion a year in lost productivity, crime, and health-care costs. So a better way to identify people suffering from the disorder, and those at risk of succumbing to it, would be hugely useful.

Bickel and co say they have developed just such a technique, which allows them to spot sufferers simply by looking at their social media messages such as Facebook posts. The technique even provides new insights into the way abuse of different substances influences people’s social media messages.

The new technique comes from the analysis of data collected between 2007 and 2012 as part of a project that ran on Facebook called myPersonality. Users who signed up were offered various psychometric tests and given feedback on their scores. Many also agreed to allow the data to be used for research purposes.

One of these tests asked over 13,000 users with an average age of 23 about the substances they used. In particular, it asked how often they used tobacco, alcohol, or other drugs, and assessed each participant’s level of use. The users were then divided into groups according to their level of substance abuse.

This data set is important because it acts as a kind of ground truth, recording the exact level of substance use for each person.

The team next gathered two other Facebook-related data sets. The first was 22 million status updates posted by more than 150,000 Facebook users. The other was even larger: the “like” data associated with 11 million Facebook users.

Finally, the team worked out how these data sets overlapped. They found almost 1,000 users who were in all the data sets, just over 1,000 who were in the substance abuse and status update data sets, and 3,500 who were in the substance abuse and likes data sets.

These users with overlapping data sets provide rich pickings for data miners. If people with substance use disorders have certain unique patterns of behavior, it may be possible to spot these in their Facebook status updates or in their patterns of likes.

So Bickel and co got to work first by text mining most of the Facebook status updates and then data mining most of the likes data set. Any patterns they found, they then tested by looking for people with similar patterns in the remaining data and seeing if they also had the same level of substance use.

The results make for interesting reading. The team says its technique was hugely successful. “Our best models achieved 86%  for predicting tobacco use, 81% for alcohol use and 84% for drug use, all of which significantly outperformed existing methods,” say Bickel and co…. (More) (Full Paper: arxiv.org/abs/1705.05633: Social Media-based Substance Use Prediction).

How Twitter Is Being Gamed to Feed Misinformation


the New York Times: “…the biggest problem with Twitter’s place in the news is its role in the production and dissemination of propaganda and misinformation. It keeps pushing conspiracy theories — and because lots of people in the media, not to mention many news consumers, don’t quite understand how it works, the precise mechanism is worth digging into….Here’s how.

The guts of the news business.

One way to think of today’s disinformation ecosystem is to picture it as a kind of gastrointestinal tract…. Twitter often acts as the small bowel of digital news. It’s where political messaging and disinformation get digested, packaged and widely picked up for mass distribution to cable, Facebook and the rest of the world.

This role for Twitter has seemed to grow more intense during (and since) the 2016 campaign. Twitter now functions as a clubhouse for much of the news. It’s where journalists pick up stories, meet sources, promote their work, criticize competitors’ work and workshop takes. In a more subtle way, Twitter has become a place where many journalists unconsciously build and gut-check a worldview — where they develop a sense of what’s important and merits coverage, and what doesn’t.

This makes Twitter a prime target for manipulators: If you can get something big on Twitter, you’re almost guaranteed coverage everywhere….

Twitter is clogged with fake people.

For determined media manipulators, getting something big on Twitter isn’t all that difficult. Unlike Facebook, which requires people to use their real names, Twitter offers users essentially full anonymity, and it makes many of its functions accessible to outside programmers, allowing people to automate their actions on the service.

As a result, numerous cheap and easy-to-use online tools let people quickly create thousands of Twitter bots — accounts that look real, but that are controlled by a puppet master.

Twitter’s design also promotes a slavish devotion to metrics: Every tweet comes with a counter of Likes and Retweets, and users come to internalize these metrics as proxies for real-world popularity….

They may ruin democracy.

…. the more I spoke to experts, the more convinced I became that propaganda bots on Twitter might be a growing and terrifying scourge on democracy. Research suggests that bots are ubiquitous on Twitter. Emilio Ferrara and Alessandro Bessi, researchers at the University of Southern California, found that about a fifth of the election-related conversation on Twitter last year was generated by bots. Most users were blind to them; they treated the bots the same way they treated other users….

in a more pernicious way, bots give us an easy way to doubt everything we see online. In the same way that the rise of “fake news” gives the president cover to label everything “fake news,” the rise of bots might soon allow us to dismiss any online enthusiasm as driven by automation. Anyone you don’t like could be a bot; any highly retweeted post could be puffed up by bots….(More)”.

Routledge Handbook on Information Technology in Government


Book edited by Yu-Che Chen and Michael J. Ahn: “The explosive growth in information technology has ushered in unparalleled new opportunities for advancing public service. Featuring 24 chapters from foremost experts in the field of digital government, this Handbook provides an authoritative survey of key emerging technologies, their current state of development and use in government, and insightful discussions on how they are reshaping and influencing the future of public administration. This Handbook explores:

  • Key emerging technologies (i.e., big data, social media, Internet of Things (IOT), GIS, smart phones & mobile technologies) and their impacts on public administration
  • The impacts of the new technologies on the relationships between citizens and their governments with the focus on collaborative governance
  • Key theories of IT innovations in government on the interplay between technological innovations and public administration
  • The relationship between technology and democratic accountability and the various ways of harnessing the new technologies to advance public value
  • Key strategies and conditions for fostering success in leveraging technological innovations for public service

This Handbook will prove to be an invaluable guide and resource for students, scholars and practitioners interested in this growing field of technological innovations in government….(More)”.

Could Big Data Help End Hunger in Africa?


Lenny Ruvaga at VOA News: “Computer algorithms power much of modern life from our Facebook feeds to international stock exchanges. Could they help end malnutrition and hunger in Africa? The International Center for Tropical Agriculture thinks so.

The International Center for Tropical Agriculture has spent the past four years developing the Nutrition Early Warning System, or NEWS.

The goal is to catch the subtle signs of a hunger crisis brewing in Africa as much as a year in advance.

CIAT says the system uses machine learning. As more information is fed into the system, the algorithms will get better at identifying patterns and trends. The system will get smarter.

Information Technology expert Andy Jarvis leads the project.

“The cutting edge side of this is really about bringing in streams of information from multiple sources and making sense of it. … But it is a huge volume of information and what it does, the novelty then, is making sense of that using things like artificial intelligence, machine learning, and condensing it into simple messages,” he said.

Other nutrition surveillance systems exist, like FEWSnet, the Famine Early Warning System Network which was created in the mid-1980s.

But CIAT says NEWS will be able to draw insights from a massive amount of diverse data enabling it to identify hunger risks faster than traditional methods.

“What is different about NEWS is that it pays attention to malnutrition, not just drought or famine, but the nutrition outcome that really matters, malnutrition especially in women and children. For the first time, we are saying these are the options way ahead of time. That gives policy makers an opportunity to really do what they intend to do which is make the lives of women and children better in Africa,” said Dr. Mercy Lung’aho, a CIAT nutrition expert.

While food emergencies like famine and drought grab headlines, the International Center for Tropical Agriculture says chronic malnutrition affects one in four people in Africa, taking a serious toll on economic growth and leaving them especially vulnerable in times of crisis….(More)”.

The Way Ahead


Transcript of lecture delivered by Stephen Fry on the 28th May  2017 • Hay Festival, Hay-on-Wye: “Peter Florence, the supremo of this great literary festival, asked me some months ago if I might, as part of Hay’s celebration of the five hundredth anniversary of Martin Luther’s kickstarting of the reformation, suggest a reform of the internet…

You will be relieved to know, that unlike Martin Luther, I do not have a full 95 theses to nail to the door, or in Hay’s case, to the tent flap. It might be worth reminding ourselves perhaps, however, of the great excitements of the early 16th century. I do not think it is a coincidence that Luther grew up as one of the very first generation to have access to printed books, much as some of you may have children who were the first to grow up with access to e-books, to iPads and to the internet….

The next big step for AI is the inevitable achievement of Artificial General Intelligence, or AGI, sometimes called ‘full artificial intelligence’ the point at which machines really do think like humans. In 2013, hundreds of experts were asked when they thought AGI may arise and the median prediction was they year 2040. After that the probability, most would say certain, is artificial super-intelligence and the possibility of reaching what is called the Technological Singularity – what computer pioneer John van Neumann described as the point “…beyond which humans affairs, as we know them, could not continue.” I don’t think I have to worry about that. Plenty of you in this tent have cause to, and your children beyond question will certainly know all about it. Unless of course the climate causes such havoc that we reach a Meteorological Singularity. Or the nuclear codes are penetrated by a self-teaching algorithm whose only purpose is to find a way to launch…

It’s clear that, while it is hard to calculate the cascade upon cascade of new developments and their positive effects, we already know the dire consequences and frightening scenarios that threaten to engulf us. We know them because science fiction writers and dystopians in all media have got there before us and laid the nightmare visions out. Their imaginations have seen it all coming. So whether you believe Ray Bradbury, George Orwell, Aldous Huxley, Isaac Asimov, Margaret Atwood, Ridley Scott, Anthony Burgess, H. G. Wells, Stanley Kubrick, Kazuo Ishiguro, Philip K. Dick, William Gibson, John Wyndham, James Cameron, the Wachowski’s or the scores and scores of other authors and film-makers who have painted scenarios of chaos and doom, you can certainly believe that a great transformation of human society is under way, greater than Gutenberg’s revolution – greater I would submit than the Industrial Revolution (though clearly dependent on it) – the greatest change to our ways of living since we moved from hunting and gathering to settling down in farms, villages and seaports and started to trade and form civilisations. Whether it will alter the behaviour, cognition and identity of the individual in the same way it is certain to alter the behaviour, cognition and identity of the group, well that is a hard question to answer.

But believe me when I say that it is happening. To be frank it has happened. The unimaginably colossal sums of money that have flowed to the first two generations of Silicon Valley pioneers have filled their coffers, their war chests, and they are all investing in autonomous cars, biotech, the IoT, robotics Artificial Intelligence and their convergence. None more so than the outlier, the front-runner Mr Elon Musk whose neural link system is well worth your reading about online on the great waitbutwhy.com website. Its author Tim Urban is a paid consultant of Elon Musk’s so he has the advantage of knowing what he is writing about but the potential disadvantage of being parti pri and lacking in objectivity. Elon Musk made enough money from his part in the founding and running of PayPal to fund his manifold exploits. The Neuralink project joins his Tesla automobile company and subsidiary battery and solar power businesses, his Space X reusable spacecraft group, his OpenAI initiative and Hyperloop transport system. The 1950s and 60s Space Race was funded by sovereign governments, this race is funded by private equity, by the original investors in Google, Apple, Facebook and so on. Nation states and their agencies are not major players in this game, least of all poor old Britain. Even if our politicians were across this issue, and they absolutely are not, our votes would still be an irrelevance….

So one thesis I would have to nail up to the tent is to clamour for government to bring all this deeper into schools and colleges. The subject of the next technological wave, I mean, not pornography and prostitution. Get people working at the leading edge of AI and robotics to come into the classrooms. But more importantly listen to them – even if what they say is unpalatable, our masters must have the intellectual courage and honesty to say if they don’t understand and ask for repetition and clarification. This time, in other words, we mustn’t let the wave engulf us, we must ride its crest. It’s not quite too late to re-gear governmental and educational planning and thinking….

The witlessness of our leaders and of ourselves is indeed a problem. The real danger surely is not technology but technophobic Canute-ism, a belief that we can control, change or stem the technological tide instead of understanding that we need to learn how to harness it. Driving cars is dangerous, but we developed driving lesson requirements, traffic controls, seat-belts, maintenance protocols, proximity sensors, emission standards – all kinds of ways of mitigating the danger so as not to deny ourselves the life-changing benefits of motoring.

We understand why angry Ned Ludd destroyed the weaving machines that were threatening his occupation (Luddites were prophetic in their way, it was weaving machines that first used the punched cards on which computers relied right up to the 1970s). We understand too why French workers took their clogs, their sabots as they were called, and threw them into the machinery to jam it up, giving us the word sabotage. But we know that they were in the end, if you’ll pardon the phrase, pissing into the wind. No technology has ever been stopped.

So what is the thesis I am nailing up? Well, there is no authority for me to protest to, no equivalent of Pope Leo X for it to be delivered to, and I am certainly no Martin Luther. The only thesis I can think worth nailing up is absurdly simple. It is a cry as much from the heart as from the head and it is just one word – Prepare. We have an advantage over our hunter gatherer and farming ancestors, for whether it is Winter that is coming, or a new Spring, is entirely in our hands, so long as we prepare….(More)”.

ControCurator: Understanding Controversy Using Collective Intelligence


Paper by Benjamin Timmermans et al: “There are many issues in the world that people do not agree on, such as Global Warming [Cook et al. 2013], Anti-Vaccination [Kata 2010] and Gun Control [Spitzer 2015]. Having opposing opinions on such topics can lead to heated discussions, making them appear controversial. Such opinions are often expressed through news articles and social media. There are increasing calls for methods to detect and monitor these online discussions on different topics. Existing methods focus on using sentiment analysis and Wikipedia for identifying controversy [Dori-Hacohen and Allan 2015]. The problem with this is that it relies on a well structured and existing debate, which may not always be the case. Take for instance news reporting during large disasters, in which case the structure of a discussion is not yet clear and may change rapidly. Adding to this is that there is currently no agreed upon definition as to what exactly defines controversy. It is only agreed that controversy arises when there is a large debate by people with opposing viewpoints, but we do not yet understand which are the characteristic aspects and how they can be measured. In this paper we use the collective intelligence of the crowd in order to gain a better understanding of controversy by evaluating the aspects that have impact on it….(More)”

See also http://crowdtruth.org/

 

How can we study disguised propaganda on social media? Some methodological reflections


Jannick Schou and Johan Farkas at DataDrivenJournalism: ’Fake news’ has recently become a seemingly ubiquitous concept among journalists, researchers, and citizens alike. With the rise of platforms such as Facebook and Twitter, it has become possible to spread deliberate forms of misinformation in hitherto unforeseen ways. This has also spilled over into the political domain, where new forms of (disguised) propaganda and false information have recently begun to emerge. These new forms of propaganda have very real effects: they serve to obstruct political decision-making processes, instil false narratives within the general public, and add fuel to already heated sites of political conflict. They represent a genuine democratic problem.

Yet, so far, both critical researchers and journalists have faced a number of issues and challenges when attempting to understand these new forms of political propaganda. Simply put: when it comes to disguised propaganda and social media, we know very little about the actual mechanisms through which such content is produced, disseminated, and negotiated. One of the key explanations for this might be that fake profiles and disguised political agendas are incredibly difficult to study. They present a serious methodological challenge. This is not only due to their highly ephemeral nature, with Facebook pages being able to vanish after only a few days or hours, but also because of the anonymity of its producers. Often, we simply do not know who is disseminating what and with what purpose. This makes it difficult for us to understand and research exactly what is going on.

This post takes its point of departure from a new article published in the international academic journal New Media & Society. Based on the research done for this article, we want to offer some methodological reflections as to how disguised propaganda might be investigated. How can we research fake and disguised political agendas? And what methodological tools do we have at our disposal?…

two main methodological advices spring to mind. First of all: collect as much data as you can in as many ways as possible. Make screenshots, take detailed written observations, use data scraping, and (if possible) participate in citizen groups. One of the most valuable resources we had at our disposal was the set of heterogeneous data we collected from each page. Using this allowed us to carefully dissect and retrace the complex set of practices involved in each page long after they were gone. While we certainly tried to be as systematic in our data collection as possible, we also had to use every tool at our disposal. And we had to constantly be on our toes. As soon as a page emerged, we were there: ready to write down notes and collect data.

Second: be willing to participate and collaborate. Our research showcases the immense potential in researchers (and journalists) actively collaborating with citizen groups and grassroots movements. Using the collective insights and attention of this group allowed us to quickly find and track down pages. It gave us renewed methodological strength. Collaborating across otherwise closed boundaries between research and journalism opens up new avenues for deeper and more detailed insights….(More)”

Data Collaboratives: exchanging data to create public value across Latin America and the Caribbean


Stefaan Verhulst, Andrew Young and Prianka Srinivasan at IADB’s Abierto al Publico: “Data is playing an ever-increasing role in bolstering businesses across Latin America – and the rest of the word. In Brazil, Mexico and Colombia alone, the revenue from Big Data is calculated at more than US$603.7 million, a market that is only set to increase as more companies across Latin America and the Caribbean embrace data-driven strategies to enhance their bottom-line. Brazilian banking giant Itau plans to create six data centers across the country, and already uses data collected from consumers online to improve cross-selling techniques and streamline their investments. Data from web-clicks, social media profiles, and telecommunication services is fueling a new generation of entrepreneurs keen to make big dollars from big data.

What if this same data could be used not just to improve business, but to improve the collective well-being of our communities, public spaces, and cities? Analysis of social media data can offer powerful insights to city officials into public trends and movements to better plan infrastructure and policies. Public health officials and humanitarian workers can use mobile phone data to, for instance, map human mobility and better target their interventions. By repurposing the data collected by companies for their business interests, governments, international organizations and NGOs can leverage big data insights for the greater public good.

Key question is thus: How to unlock useful data collected by corporations in a responsible manner and ensure its vast potential does not go to waste?

Data Collaboratives” are emerging as a possible answer. Data collaboratives are a new type of public-private partnerships aimed at creating public value by exchanging data across sectors.

Research conducted by the GovLab finds that Data Collaboratives offer several potential benefits across a number of sectors, including humanitarian and anti-poverty efforts, urban planning, natural resource stewardship, health, and disaster management. As a greater number of companies in Latin America look to data to spur business interests, our research suggests that some companies are also sharing and collaborating around data to confront some of society’s most pressing problems.

Consider the following Data Collaboratives that seek to enhance…(More)”

Twitter as a data source: An overview of tools for journalists


Wasim Ahmed at Data Driven Journalism: “Journalists may wish to use data from social media platforms in order to provide greater insight and context to a news story. For example, journalists may wish to examine the contagion of hashtags and whether they are capable of achieving political or social change. Moreover, newsrooms may also wish to tap into social media posts during unfolding crisis events. For example, to find out who tweeted about a crisis event first, and to empirically examine the impact of social media.

Furthermore, Twitter users and accounts such as WikiLeaks may operate outside the constraints of traditional journalism, and therefore it becomes important to have tools and mechanisms in place in order to examine these kinds of influential users. For example, it was found that those who were backing Marine Le Pen on Twitter could have been users who had an affinity to Donald Trump.

There remains a number of different methods for analysing social media data. Take text analytics, for example, which can include using sentiment analysis to place bulk social media posts into categories of a particular feeling, such as positive, negative, or neutral. Or machine learning, which can automatically assign social media posts to a number of different topics.

There are other methods such as social network analysis, which examines online communities and the relationships between them. A number of qualitative methodologies also exist, such as content analysis and thematic analysis, which can be used to manually label social media posts. From a journalistic perspective, network analysis may be of importance initially via tools such as NodeXL. This is because it can quickly provide an overview of influential Twitter users alongside a topic overview.

From an industry standpoint, there has been much focus on gaining insight into users’ personalities, through services such as IBM Watson’s Personality Insights service. This uses linguistic analytics to derive intrinsic personality insights, such as emotions like anxiety, self-consciousness, and depression. This information can then be used by marketers to target certain products; for example, anti-anxiety medication to users who are more anxious…(An overview of tools for 2017).”

UK government watchdog examining political use of data analytics


“Given the big data revolution, it is understandable that political campaigns are exploring the potential of advanced data analysis tools to help win votes,” Elizabeth Denham, the information commissioner, writes on the ICO’s blog. However, “the public have the right to expect” that this takes place in accordance with existing data protection laws, she adds.

Political parties are able to use Facebook to target voters with different messages, tailoring the advert to recipients based on their demographic. In the 2015 UK general election, the Conservative party spent £1.2 million on Facebook campaigns and the Labour party £16,000. It is expected that Labour will vastly increase that spend for the general election on 8 June….

Political parties and third-party companies are allowed to collect data from sites like Facebook and Twitter that lets them tailor these ads to broadly target different demographics. However, if those ads target identifiable individuals, it runs afoul of the law….(More)”