Ten simple rules for responsible big data research


Matthew Zook et al in PLOS Computational Biology: “The use of big data research methods has grown tremendously over the past five years in both academia and industry. As the size and complexity of available datasets has grown, so too have the ethical questions raised by big data research. These questions become increasingly urgent as data and research agendas move well beyond those typical of the computational and natural sciences, to more directly address sensitive aspects of human behavior, interaction, and health. The tools of big data research are increasingly woven into our daily lives, including mining digital medical records for scientific and economic insights, mapping relationships via social media, capturing individuals’ speech and action via sensors, tracking movement across space, shaping police and security policy via “predictive policing,” and much more.

The beneficial possibilities for big data in science and industry are tempered by new challenges facing researchers that often lie outside their training and comfort zone. Social scientists now grapple with data structures and cloud computing, while computer scientists must contend with human subject protocols and institutional review boards (IRBs). While the connection between individual datum and actual human beings can appear quite abstract, the scope, scale, and complexity of many forms of big data creates a rich ecosystem in which human participants and their communities are deeply embedded and susceptible to harm. This complexity challenges any normative set of rules and makes devising universal guidelines difficult.

Nevertheless, the need for direction in responsible big data research is evident, and this article provides a set of “ten simple rules” for addressing the complex ethical issues that will inevitably arise. Modeled on PLOS Computational Biology’s ongoing collection of rules, the recommendations we outline involve more nuance than the words “simple” and “rules” suggest. This nuance is inevitably tied to our paper’s starting premise: all big data research on social, medical, psychological, and economic phenomena engages with human subjects, and researchers have the ethical responsibility to minimize potential harm….

  1. Acknowledge that data are people and can do harm
  2. Recognize that privacy is more than a binary value
  3. Guard against the reidentification of your data
  4. Practice ethical data sharing
  5. Consider the strengths and limitations of your data; big does not automatically mean better
  6. Debate the tough, ethical choices
  7. Develop a code of conduct for your organization, research community, or industry
  8. Design your data and systems for auditability
  9. Engage with the broader consequences of data and analysis practices
  10. Know when to break these rules…(More)”

What Algorithms Want


Book by Ed Finn: “We depend on—we believe in—algorithms to help us get a ride, choose which book to buy, execute a mathematical proof. It’s as if we think of code as a magic spell, an incantation to reveal what we need to know and even what we want. Humans have always believed that certain invocations—the marriage vow, the shaman’s curse—do not merely describe the world but make it. Computation casts a cultural shadow that is shaped by this long tradition of magical thinking. In this book, Ed Finn considers how the algorithm—in practical terms, “a method for solving a problem”—has its roots not only in mathematical logic but also in cybernetics, philosophy, and magical thinking.

Finn argues that the algorithm deploys concepts from the idealized space of computation in a messy reality, with unpredictable and sometimes fascinating results. Drawing on sources that range from Neal Stephenson’s Snow Crash to Diderot’s Encyclopédie, from Adam Smith to the Star Trek computer, Finn explores the gap between theoretical ideas and pragmatic instructions. He examines the development of intelligent assistants like Siri, the rise of algorithmic aesthetics at Netflix, Ian Bogost’s satiric Facebook game Cow Clicker, and the revolutionary economics of Bitcoin. He describes Google’s goal of anticipating our questions, Uber’s cartoon maps and black box accounting, and what Facebook tells us about programmable value, among other things.

If we want to understand the gap between abstraction and messy reality, Finn argues, we need to build a model of “algorithmic reading” and scholarship that attends to process, spearheading a new experimental humanities….(More)”

Will Computer Science become a Social Science?


Paper by Ingo Scholtes, Markus Strohmaier and Frank Schweitzer: “When Tay – a Twitter chatbot developed by Microsoft – was activated this March, the company was taken by surprise by what Tay had become. Within less than 24 hours of conversation with Twitter users Tay had learned to make racist, anti-semitic and misogynistic statements that have raised eyebrows in the Twitter community and beyond. What had happened? While Microsoft certainly tested the chat bot before release, planning for the reactions and the social environment in which it was deployed proved tremendously difficult. Yet, the Tay Twitter chatbot incident is just one example for the many challenges which arise when embedding algorithms and computing systems into an ever increasing spectrum of social systems. In this viewpoint we argue that, due to the resulting feedback loops by which computing technologies impact social behavior and social behavior feeds back on (learning) computing systems, we face the risk of losing control over the systems that we engineer. The result are unintended consequences that affect both the technical and social dimension of computing systems, and which computer science is currently not well-prepared to address. Highlighting exemplary challenges in core areas like (1) algorithm design, (2) cyber-physical systems, and (3) software engineering, we argue that social aspects must be turned into first-class citizens of our system models. We further highlight that the social sciences, in particular the interdisciplinary field of Computational Social Science [1], provide us with means to quantitatively analyze, model and predict human behavior. As such, a closer integration between computer science and social sciences not only provides social scientists with new ways to understand social phenomena. It also helps us to regain control over the systems that we engineer….(More)”

Confused by data visualisation? Here’s how to cope in a world of many features


 in The Conversation: “The late data visionary Hans Rosling mesmerised the world with his work, contributing to a more informed society. Rosling used global health data to paint a stunning picture of how our world is a better place now than it was in the past, bringing hope through data.

Now more than ever, data are collected from every aspect of our lives. From social media and advertising to artificial intelligence and automated systems, understanding and parsing information have become highly valuable skills. But we often overlook the importance of knowing how to communicate data to peers and to the public in an effective, meaningful way.

The first tools that come to mind in considering how to best communicate data – especially statistics – are graphs and scatter plots. These simple visuals help us understand elementary causes and consequences, trends and so on. They are invaluable and have an important role in disseminating knowledge.

Data visualisation can take many other forms, just as data itself can be interpreted in many different ways. It can be used to highlight important achievements, as Bill and Melinda Gates have shown with their annual letters in which their main results and aspirations are creatively displayed.

Everyone has the potential to better explore data sets and provide more thorough, yet simple, representations of facts. But how can do we do this when faced with daunting levels of complex data?

A world of too many features

We can start by breaking the data down. Any data set consists of two main elements: samples and features. The former correspond to individual elements in a group; the latter are the characteristics they share….

Venturing into network analysis is easier than undertaking dimensionality reduction, since usually a high level of programming skills is not required. Widely available user-friendly software and tutorials allow people new to data visualisation to explore several aspects of network science.

The world of data visualisation is vast and it goes way beyond what has been introduced here, but those who actually reap its benefits, garnering new insights and becoming agents of positive and efficient change, are few. In an age of overwhelming information, knowing how to communicate data can make a difference – and it can help keep data’s relevance in check…(More)”

Prediction and Inference from Social Networks and Social Media


Book edited by Kawash, Jalal, Agarwal, Nitin, Özyer, Tansel: “This book addresses the challenges of social network and social media analysis in terms of prediction and inference. The chapters collected here tackle these issues by proposing new analysis methods and by examining mining methods for the vast amount of social content produced. Social Networks (SNs) have become an integral part of our lives; they are used for leisure, business, government, medical, educational purposes and have attracted billions of users. The challenges that stem from this wide adoption of SNs are vast. These include generating realistic social network topologies, awareness of user activities, topic and trend generation, estimation of user attributes from their social content, and behavior detection. This text has applications to widely used platforms such as Twitter and Facebook and appeals to students, researchers, and professionals in the field….(More)”

Can social media, loud and inclusive, fix world politics


 at the Conversation: “Privacy is no longer a social norm, said Facebook founder Mark Zuckerberg in 2010, as social media took a leap to bring more private information into the public domain.

But what does it mean for governments, citizens and the exercise of democracy? Donald Trump is clearly not the first leader to use his Twitter account as a way to both proclaim his policies and influence the political climate. Social media presents novel challenges to strategic policy, and has become a managerial issues for many governments.

But it also offers a free platform for public participation in government affairs. Many argue that the rise of social media technologies can give citizens and observers a better opportunity to identify pitfalls of government and their politics.

As government embrace the role of social media and the influence of negative or positive feedback on the success of their project, they are also using this tool to their advantages by spreading fabricated news.

This much freedom of expression and opinion can be a double-edged sword.

A tool that triggers change

On the positive side, social media include social networking applications such as Facebook and Google+, microblogging services such as Twitter, blogs, video blogs (vlogs), wikis, and media-sharing sites such as YouTube and Flickr, among others.

Social media as a collaborative and participatory tool, connects users with each other and help shaping various communities. Playing a key role in delivering public service value to citizens it also helps people to engage in politics and policy-making, making processes easier to understand, through information and communication technologies (ICTs).

Today four out of five countries in the world have social media features on their national portals to promote interactive networking and communication with the citizen. Although we don’t have any information about the effectiveness of such tools or whether they are used to their full potential, 20% of these countries shows that they have “resulted in new policy decisions, regulation or service”.

Social media can be an effective tool to trigger changes in government policies and services if well used. It can be used to prevent corruption, as it is direct method of reaching citizens. In developing countries, corruption is often linked to governmental services that lack automated processes or transparency in payments.

The UK is taking the lead on this issue. Its anti-corruption innovation hub aims to connect several stakeholders – including civil society, law enforcement and technologies experts – to engage their efforts toward a more transparent society.

With social media, governments can improve and change the way they communicate with their citizens – and even question government projects and policies. In Kazakhstan, for example, a migration-related legislative amendment entered into force early January 2017 and compelled property owners to register people residing in their homes immediately or else face a penalty charge starting in February 2017.

Citizens were unprepared for this requirement, and many responded with indignation on social media. At first the government ignored this reaction. However, as the growing anger soared via social media, the government took action and introduced a new service to facilitate the registration of temporary citizens….

But the campaigns that result do not always evolve into positive change.

Egypt and Libya are still facing several major crises over the last years, along with political instability and domestic terrorism. The social media influence that triggered the Arab Spring did not permit these political systems to turn from autocracy to democracy.

Brazil exemplifies a government’s failure to react properly to a massive social media outburst. In June 2013 people took to the streets to protest the rising fares of public transportation. Citizens channelled their anger and outrage through social media to mobilise networks and generate support.

The Brazilian government didn’t understand that “the message is the people”. Though the riots some called the “Tropical Spring” disappeared rather abruptly in the months to come, they had major and devastating impact on Brazil’s political power, culminating in the impeachment of President Rousseff in late 2016 and the worst recession in Brazil’s history.

As in the Arab Spring countries, the use of social media in Brazil did not result in economic improvement. The country has tumbled down into depression, and unemployment has risen to 12.6%…..

Government typically asks “how can we adapt social media to the way in which we do e-services, and then try to shape their policies accordingly. They would be wiser to ask, “how can social media enable us to do things differently in a way they’ve never been done before?” – that is, policy-making in collaboration with people….(More)”.

The Conversation

The Problem With Facts


Tim Hartford: “…In 1995, Robert Proctor, a historian at Stanford University who has studied the tobacco case closely, coined the word “agnotology”. This is the study of how ignorance is deliberately produced; the entire field was started by Proctor’s observation of the tobacco industry. The facts about smoking — indisputable facts, from unquestionable sources — did not carry the day. The indisputable facts were disputed. The unquestionable sources were questioned. Facts, it turns out, are important, but facts are not enough to win this kind of argument.

Agnotology has never been more important. “We live in a golden age of ignorance,” says Proctor today. “And Trump and Brexit are part of that.”

In the UK’s EU referendum, the Leave side pushed the false claim that the UK sent £350m a week to the EU. It is hard to think of a previous example in modern western politics of a campaign leading with a transparent untruth, maintaining it when refuted by independent experts, and going on to triumph anyway. That performance was soon to be eclipsed by Donald Trump, who offered wave upon shameless wave of demonstrable falsehood, only to be rewarded with the presidency. The Oxford Dictionaries declared “post-truth” the word of 2016. Facts just didn’t seem to matter any more.

The instinctive reaction from those of us who still care about the truth — journalists, academics and many ordinary citizens — has been to double down on the facts. Fact-checking organisations, such as Full Fact in the UK and PolitiFact in the US, evaluate prominent claims by politicians and journalists. I should confess a personal bias: I have served as a fact checker myself on the BBC radio programme More or Less, and I often rely on fact-checking websites. They judge what’s true rather than faithfully reporting both sides as a traditional journalist would. Public, transparent fact checking has become such a feature of today’s political reporting that it’s easy to forget it’s barely a decade old.

Mainstream journalists, too, are starting to embrace the idea that lies or errors should be prominently identified. Consider a story on the NPR website about Donald Trump’s speech to the CIA in January: “He falsely denied that he had ever criticised the agency, falsely inflated the crowd size at his inauguration on Friday . . . —” It’s a bracing departure from the norms of American journalism, but then President Trump has been a bracing departure from the norms of American politics.

Facebook has also drafted in the fact checkers, announcing a crackdown on the “fake news” stories that had become prominent on the network after the election. Facebook now allows users to report hoaxes. The site will send questionable headlines to independent fact checkers, flag discredited stories as “disputed”, and perhaps downgrade them in the algorithm that decides what each user sees when visiting the site.

We need some agreement about facts or the situation is hopeless. And yet: will this sudden focus on facts actually lead to a more informed electorate, better decisions, a renewed respect for the truth? The history of tobacco suggests not. The link between cigarettes and cancer was supported by the world’s leading medical scientists and, in 1964, the US surgeon general himself. The story was covered by well-trained journalists committed to the values of objectivity. Yet the tobacco lobbyists ran rings round them.

In the 1950s and 1960s, journalists had an excuse for their stumbles: the tobacco industry’s tactics were clever, complex and new. First, the industry appeared to engage, promising high-quality research into the issue. The public were assured that the best people were on the case. The second stage was to complicate the question and sow doubt: lung cancer might have any number of causes, after all. And wasn’t lung cancer, not cigarettes, what really mattered? Stage three was to undermine serious research and expertise. Autopsy reports would be dismissed as anecdotal, epidemiological work as merely statistical, and animal studies as irrelevant. Finally came normalisation: the industry would point out that the tobacco-cancer story was stale news. Couldn’t journalists find something new and interesting to say?

Such tactics are now well documented — and researchers have carefully examined the psychological tendencies they exploited. So we should be able to spot their re-emergence on the political battlefield.

“It’s as if the president’s team were using the tobacco industry’s playbook,” says Jon Christensen, a journalist turned professor at the University of California, Los Angeles, who wrote a notable study in 2008 of the way the tobacco industry tugged on the strings of journalistic tradition.

One infamous internal memo from the Brown & Williamson tobacco company, typed up in the summer of 1969, sets out the thinking very clearly: “Doubt is our product.” Why? Because doubt “is the best means of competing with the ‘body of fact’ that exists in the mind of the general public. It is also the means of establishing a controversy.” Big Tobacco’s mantra: keep the controversy alive.

Doubt is usually not hard to produce, and facts alone aren’t enough to dispel it. We should have learnt this lesson already; now we’re going to have to learn it all over again.

Tempting as it is to fight lies with facts, there are three problems with that strategy….(More)”

iGod


Novel by Willemijn Dicke and Dirk Helbing: “iGod is a science fiction novel with heroes, love, defeat and hope. But it is much more than that. This book aims to explore how societies may develop, given the technologies that we see at present. As Dirk Helbing describes it in his introduction:

We have come to the conclusion that neither a scientific study nor an investigative report would allow one to talk about certain things that, we believe, need to be thought and talked about. So, a science fiction story appeared to be the right approach. It seems the perfect way to think “what if scenarios” through. It is not the first time that this avenue has been taken. George Orwell’s “1984” and “Animal Farm” come to mind, or Dave Eggers “The Circle”. The film ‘The Matrix’ and the Netflix series ‘Black Mirror are good examples too.

“iGod” outlines how life could be in a couple of years from now, certainly in our lifetime. At some places, this story about our future society seems far-fetched. For example, in “iGod”, all citizens have a Social Citizen Score. This score is established based on their buying habits, their communication in social media and social contacts they maintain. It is obtained by mass-surveillance and has a major impact on everyone’s life. It determines whether you are entitled to get a loan, what jobs you are offered, and even how long you will receive medical care.

The book is set in the near future in Amsterdam, the Netherlands. Lex is an unemployed biologist. One day he is contacted by a computer which, gradually reveals the machinery behind the reality we see. It is a bleak world. Together with his girlfriend Diana and Seldon, a Professor at Amsterdam Tech, he starts the quest to regain freedom….(More) (Individual chapters)”

Does digital democracy improve democracy?


Thamy Pogrebinschi at Open Democracy: “The advancement of tools of information and communications technology (ICT) has the potential to impact democracy nearly as much as any other area, such as science or education. The effects of the digital world on politics and society are still difficult to measure, and the speed with which these new technological tools evolve is often faster than a scholar’s ability to assess them, or a policymaker’s capacity to make them fit into existing institutional designs.

Since their early inception, digital tools and widespread access to the internet have been changing the traditional means of participation in politics, making them more effective. Electoral processes have become more transparent and effective in several countries where the paper ballot has been substituted for electronic voting machines. Petition-signing became a widespread and powerful tool as individual citizens no longer needed to be bothered out in the streets to sign a sheet of paper, but could instead be simultaneously reached by the millions via e-mail and have their names added to virtual petition lists in seconds. Protests and demonstrations have also been immensely revitalized in the internet era. In the last few years, social networks like Facebook and WhatsApp have proved to be a driving-force behind democratic uprisings, by mobilizing the masses, invoking large gatherings, and raising awareness, as was the case of the Arab Spring.

While traditional means of political participation can become more effective by reducing the costs of participation with the use of ICT tools, one cannot yet assure that it would become less subject to distortion and manipulation. In the most recent United States’ elections, computer scientists claimed that electronic voting machines may have been hacked, altering the results in the counties that relied on them. E-petitions can also be easily manipulated, if safe identification procedures are not put in place. And in these times of post-facts and post-truths, protests and demonstrations can result from strategic partisan manipulation of social media, leading to democratic instability as has recently occurred in Brazil. Nevertheless, the distortion and manipulation of these traditional forms of participation were also present before the rise of ICT tools, and regardless, even if the latter do not solve these preceding problems, they may manage to make political processes more effective anyway.

The game-changer for democracy, however, is not the revitalization of the traditional means of political participation like elections, petition-signing and protests through digital tools. Rather, the real change on how democracy works, governments rule, and representation is delivered comes from entirely new means of e-participation, or the so-called digital democratic innovations. While the internet may boost traditional forms of political participation by increasing the quantity of citizens engaged, democratic innovations that rely on ICT tools may change the very quality of participation, thus in the long-run changing the nature of democracy and its institutions….(More)”

Watchdog to launch inquiry into misuse of data in politics


, and Alice Gibbs in The Guardian: “The UK’s privacy watchdog is launching an inquiry into how voters’ personal data is being captured and exploited in political campaigns, cited as a key factor in both the Brexit and Trump victories last year.

The intervention by the Information Commissioner’s Office (ICO) follows revelations in last week’s Observer that a technology company part-owned by a US billionaire played a key role in the campaign to persuade Britons to vote to leave the European Union.

It comes as privacy campaigners, lawyers, politicians and technology experts express fears that electoral laws are not keeping up with the pace of technological change.

“We are conducting a wide assessment of the data-protection risks arising from the use of data analytics, including for political purposes, and will be contacting a range of organisations,” an ICO spokeswoman confirmed. “We intend to publicise our findings later this year.”

The ICO spokeswoman confirmed that it had approached Cambridge Analytica over its apparent use of data following the story in the Observer. “We have concerns about Cambridge Analytica’s reported use of personal data and we are in contact with the organisation,” she said….

In the US, companies are free to use third-party data without seeking consent. But Gavin Millar QC, of Matrix Chambers, said this was not the case in Europe. “The position in law is exactly the same as when people would go canvassing from door to door,” Millar said. “They have to say who they are, and if you don’t want to talk to them you can shut the door in their face.That’s the same principle behind the data protection act. It’s why if telephone canvassers ring you, they have to say that whole long speech. You have to identify yourself explicitly.”…

Dr Simon Moores, visiting lecturer in the applied sciences and computing department at Canterbury Christ Church University and a technology ambassador under the Blair government, said the ICO’s decision to shine a light on the use of big data in politics was timely.

“A rapid convergence in the data mining, algorithmic and granular analytics capabilities of companies like Cambridge Analytica and Facebook is creating powerful, unregulated and opaque ‘intelligence platforms’. In turn, these can have enormous influence to affect what we learn, how we feel, and how we vote. The algorithms they may produce are frequently hidden from scrutiny and we see only the results of any insights they might choose to publish.” …(More)”