Ten simple rules for responsible big data research


Matthew Zook et al in PLOS Computational Biology: “The use of big data research methods has grown tremendously over the past five years in both academia and industry. As the size and complexity of available datasets has grown, so too have the ethical questions raised by big data research. These questions become increasingly urgent as data and research agendas move well beyond those typical of the computational and natural sciences, to more directly address sensitive aspects of human behavior, interaction, and health. The tools of big data research are increasingly woven into our daily lives, including mining digital medical records for scientific and economic insights, mapping relationships via social media, capturing individuals’ speech and action via sensors, tracking movement across space, shaping police and security policy via “predictive policing,” and much more.

The beneficial possibilities for big data in science and industry are tempered by new challenges facing researchers that often lie outside their training and comfort zone. Social scientists now grapple with data structures and cloud computing, while computer scientists must contend with human subject protocols and institutional review boards (IRBs). While the connection between individual datum and actual human beings can appear quite abstract, the scope, scale, and complexity of many forms of big data creates a rich ecosystem in which human participants and their communities are deeply embedded and susceptible to harm. This complexity challenges any normative set of rules and makes devising universal guidelines difficult.

Nevertheless, the need for direction in responsible big data research is evident, and this article provides a set of “ten simple rules” for addressing the complex ethical issues that will inevitably arise. Modeled on PLOS Computational Biology’s ongoing collection of rules, the recommendations we outline involve more nuance than the words “simple” and “rules” suggest. This nuance is inevitably tied to our paper’s starting premise: all big data research on social, medical, psychological, and economic phenomena engages with human subjects, and researchers have the ethical responsibility to minimize potential harm….

  1. Acknowledge that data are people and can do harm
  2. Recognize that privacy is more than a binary value
  3. Guard against the reidentification of your data
  4. Practice ethical data sharing
  5. Consider the strengths and limitations of your data; big does not automatically mean better
  6. Debate the tough, ethical choices
  7. Develop a code of conduct for your organization, research community, or industry
  8. Design your data and systems for auditability
  9. Engage with the broader consequences of data and analysis practices
  10. Know when to break these rules…(More)”

Will Computer Science become a Social Science?


Paper by Ingo Scholtes, Markus Strohmaier and Frank Schweitzer: “When Tay – a Twitter chatbot developed by Microsoft – was activated this March, the company was taken by surprise by what Tay had become. Within less than 24 hours of conversation with Twitter users Tay had learned to make racist, anti-semitic and misogynistic statements that have raised eyebrows in the Twitter community and beyond. What had happened? While Microsoft certainly tested the chat bot before release, planning for the reactions and the social environment in which it was deployed proved tremendously difficult. Yet, the Tay Twitter chatbot incident is just one example for the many challenges which arise when embedding algorithms and computing systems into an ever increasing spectrum of social systems. In this viewpoint we argue that, due to the resulting feedback loops by which computing technologies impact social behavior and social behavior feeds back on (learning) computing systems, we face the risk of losing control over the systems that we engineer. The result are unintended consequences that affect both the technical and social dimension of computing systems, and which computer science is currently not well-prepared to address. Highlighting exemplary challenges in core areas like (1) algorithm design, (2) cyber-physical systems, and (3) software engineering, we argue that social aspects must be turned into first-class citizens of our system models. We further highlight that the social sciences, in particular the interdisciplinary field of Computational Social Science [1], provide us with means to quantitatively analyze, model and predict human behavior. As such, a closer integration between computer science and social sciences not only provides social scientists with new ways to understand social phenomena. It also helps us to regain control over the systems that we engineer….(More)”

Confused by data visualisation? Here’s how to cope in a world of many features


 in The Conversation: “The late data visionary Hans Rosling mesmerised the world with his work, contributing to a more informed society. Rosling used global health data to paint a stunning picture of how our world is a better place now than it was in the past, bringing hope through data.

Now more than ever, data are collected from every aspect of our lives. From social media and advertising to artificial intelligence and automated systems, understanding and parsing information have become highly valuable skills. But we often overlook the importance of knowing how to communicate data to peers and to the public in an effective, meaningful way.

The first tools that come to mind in considering how to best communicate data – especially statistics – are graphs and scatter plots. These simple visuals help us understand elementary causes and consequences, trends and so on. They are invaluable and have an important role in disseminating knowledge.

Data visualisation can take many other forms, just as data itself can be interpreted in many different ways. It can be used to highlight important achievements, as Bill and Melinda Gates have shown with their annual letters in which their main results and aspirations are creatively displayed.

Everyone has the potential to better explore data sets and provide more thorough, yet simple, representations of facts. But how can do we do this when faced with daunting levels of complex data?

A world of too many features

We can start by breaking the data down. Any data set consists of two main elements: samples and features. The former correspond to individual elements in a group; the latter are the characteristics they share….

Venturing into network analysis is easier than undertaking dimensionality reduction, since usually a high level of programming skills is not required. Widely available user-friendly software and tutorials allow people new to data visualisation to explore several aspects of network science.

The world of data visualisation is vast and it goes way beyond what has been introduced here, but those who actually reap its benefits, garnering new insights and becoming agents of positive and efficient change, are few. In an age of overwhelming information, knowing how to communicate data can make a difference – and it can help keep data’s relevance in check…(More)”

Behavioural Insights and Public Policy


OECD Report: ““Behavioural insights”, or insights derived from the behavioural and social sciences, including decision making, psychology, cognitive science, neuroscience, organisational and group behaviour, are being applied by governments with the aim of making public policies work better. As their use has become more widespread, however, questions are being raised about their effectiveness as well as their philosophical underpinnings. This report discusses the use and reach of behavioural insights, drawing on a comprehensive collection of over 100 applications across the world and policy sectors, including consumer protection, education, energy, environment, finance, health and safety, labour market policies, public service delivery, taxes and telecommunications. It suggests ways to ensure that this experimental approach can be successfully and sustainably used as a public policy tool…(More)”.

Innovations in Federal Statistics: Combining Data Sources While Protecting Privacy


Report by the National Academies of Sciences’s Panel on Improving Federal Statistics for Policy and Social Science: “Federal government statistics provide critical information to the country and serve a key role in a democracy. For decades, sample surveys with instruments carefully designed for particular data needs have been one of the primary methods for collecting data for federal statistics. However, the costs of conducting such surveys have been increasing while response rates have been declining, and many surveys are not able to fulfill growing demands for more timely information and for more detailed information at state and local levels.

Innovations in Federal Statistics examines the opportunities and risks of using government administrative and private sector data sources to foster a paradigm shift in federal statistical programs that would combine diverse data sources in a secure manner to enhance federal statistics. This first publication of a two-part series discusses the challenges faced by the federal statistical system and the foundational elements needed for a new paradigm….(More)”

Nudging people to make good choices can backfire


Bruce Bower in ScienceNews: “Nudges are a growth industry. Inspired by a popular line of psychological research and introduced in a best-selling book a decade ago, these inexpensive behavior changers are currently on a roll.

Policy makers throughout the world, guided by behavioral scientists, are devising ways to steer people toward decisions deemed to be in their best interests. These simple interventions don’t force, teach or openly encourage anyone to do anything. Instead, they nudge, exploiting for good — at least from the policy makers’ perspective — mental tendencies that can sometimes lead us astray.

But new research suggests that low-cost nudges aimed at helping the masses have drawbacks. Even simple interventions that work at first can lead to unintended complications, creating headaches for nudgers and nudgees alike…

Promising results of dozens of nudge initiatives appear in two government reports issued last September. One came from the White House, which released the second annual report of its Social and Behavioral Sciences Team. The other came from the United Kingdom’s Behavioural Insights Team. Created by the British government in 2010, the U.K. group is often referred to as the Nudge Unit.

In a September 20, 2016, Bloomberg View column, Sunstein said the new reports show that nudges work, but often increase by only a few percentage points the number of people who, say, receive government benefits or comply with tax laws. He called on choice architects to tackle bigger challenges, such as finding ways to nudge people out of poverty or into higher education.

Missing from Sunstein’s comments and from the government reports, however, was any mention of a growing conviction among some researchers that well-intentioned nudges can have negative as well as positive effects. Accepting automatic enrollment in a company’s savings plan, for example, can later lead to regret among people who change jobs frequently or who realize too late that a default savings rate was set too low for their retirement needs. E-mail reminders to donate to a charity may work at first, but annoy recipients into unsubscribing from the donor list.

“I don’t want to get rid of nudges, but we’ve been a bit too optimistic in applying them to public policy,” says behavioral economist Mette Trier Damgaard of Aarhus University in Denmark.

Nudges, like medications for physical ailments, require careful evaluation of intended and unintended effects before being approved, she says. Policy makers need to know when and with whom an intervention works well enough to justify its side effects.

Default downer

That warning rings especially true for what is considered a shining star in the nudge universe — automatic enrollment of employees in retirement savings plans. The plans, called defaults, take effect unless workers decline to participate….

But little is known about whether automatic enrollees are better or worse off as time passes and their personal situations change, says Harvard behavioral economist Brigitte Madrian. She coauthored the 2001 paper on the power of default savings plans.

Although automatic plans increase savings for those who otherwise would have squirreled away little or nothing, others may lose money because they would have contributed more to a self-directed retirement account, Madrian says. In some cases, having an automatic savings account may encourage irresponsible spending or early withdrawals of retirement money (with penalties) to cover debts. Such possibilities are plausible but have gone unstudied.

In line with Madrian’s concerns, mathematical models developed by finance professor Bruce Carlin of the University of California, Los Angeles and colleagues suggest that people who default into retirement plans learn less about money matters, and share less financial information with family and friends, than those who join plans that require active investment choices.

Opt-out savings programs “have been oversimplified to the public and are being sold as a great way to change behavior without addressing their complexities,” Madrian says. Research needs to address how well these plans mesh with individuals’ personalities and decision-making styles, she recommends….

Researchers need to determine how defaults and other nudges instigate behavior changes before unleashing them on the public, says philosopher of science Till Grüne-Yanoff of the Royal Institute of Technology in Stockholm….

Sometimes well-intentioned, up-front attempts to get people to do what seems right come back to bite nudgers on the bottom line.

Consider e-mail prompts and reminders. ….A case in point is a study submitted for publication by Damgaard and behavioral economist Christina Gravert of the University of Gothenburg in Sweden. E-mailed donation reminders sent to people who had contributed to a Danish anti-poverty charity increased the number of donations in the short term, but also triggered an upturn in the number of people unsubscribing from the list.

People’s annoyance at receiving reminders perceived as too frequent or pushy cost the charity money over the long haul, Damgaard holds. Losses of list subscribers more than offset the financial gains from the temporary uptick in donations, she and Gravert conclude.

“Researchers have tended to overlook the hidden costs of nudging,” Damgaard says….(More)”

Drones used in fight against plastic pollution on UK beaches


Tom Cheshire at SkyNews: “On a beach in Kent, Peter Koehler and Ellie Mackay are teaching a drone how to see.

Their project, Plastic Tide, aims to create software that will automatically pick out the pieces of plastic that wash up here on the shingle.

“One of the major challenges we face is that we can only account for 1% of those millions and millions of tonnes [of plastic] that are coming into our oceans every year,” Mr Koehler told Sky News.

“So the question is, where is that 99% going?”

He added: “We just don’t know. It could be in the water, it could be in wildlife, or it could be on beaches.

“And so what the Plastic Tide is doing, it’s using drone technology to image beaches in a way that’s never been done before, on a scientific scale. So that you can build up a picture of how much of that missing 99% is washing up on our beaches.”

Mr Koehler and Ms Mackay use an off-the-shelf drone. They select the area of beach they want to film and a free app comes up with a survey pattern flight path – the drone moves systematically up and down the beach as if it were ploughing it.

The images taken are then uploaded to a scientific crowd-sourcing platform called Zooniverse.

Anyone can log on, look at the images and tag bits of plastic in them.

That will build up a huge amount of data, which will be used to train a machine-learning algorithm to spot plastic by itself – no humans required.

The hope is that, eventually, anyone will be able to fly a drone, take images, then computers will automatically scan the images and determine the levels of plastic pollution on a beach.

This summer, Mr Koehler and Ms Mackay will travel all 3,200 miles of the UK coastline, surveying beaches….

There’s no new, groundbreaking piece of technology here.

Just off-the-shelf components, smart thinking and a desire to put a small dent in a huge problem….(More)”

Seeing Theory


Seeing Theory is a project designed and created by Daniel Kunin with support from Brown University’s Royce Fellowship Program. The goal of the project is to make statistics more accessible to a wider range of students through interactive visualizations.

Statistics is quickly becoming the most important and multi-disciplinary field of mathematics. According to the American Statistical Association, “statistician” is one of the top ten fastest-growing occupations and statistics is one of the fastest-growing bachelor degrees. Statistical literacy is essential to our data driven society. Yet, for all the increased importance and demand for statistical competence, the pedagogical approaches in statistics have barely changed. Using Mike Bostock’s data visualization software, D3.js, Seeing Theory visualizes the fundamental concepts covered in an introductory college statistics or Advanced Placement statistics class. Students are encouraged to use Seeing Theory as an additional resource to their textbook, professor and peers….(More)”

iGod


Novel by Willemijn Dicke and Dirk Helbing: “iGod is a science fiction novel with heroes, love, defeat and hope. But it is much more than that. This book aims to explore how societies may develop, given the technologies that we see at present. As Dirk Helbing describes it in his introduction:

We have come to the conclusion that neither a scientific study nor an investigative report would allow one to talk about certain things that, we believe, need to be thought and talked about. So, a science fiction story appeared to be the right approach. It seems the perfect way to think “what if scenarios” through. It is not the first time that this avenue has been taken. George Orwell’s “1984” and “Animal Farm” come to mind, or Dave Eggers “The Circle”. The film ‘The Matrix’ and the Netflix series ‘Black Mirror are good examples too.

“iGod” outlines how life could be in a couple of years from now, certainly in our lifetime. At some places, this story about our future society seems far-fetched. For example, in “iGod”, all citizens have a Social Citizen Score. This score is established based on their buying habits, their communication in social media and social contacts they maintain. It is obtained by mass-surveillance and has a major impact on everyone’s life. It determines whether you are entitled to get a loan, what jobs you are offered, and even how long you will receive medical care.

The book is set in the near future in Amsterdam, the Netherlands. Lex is an unemployed biologist. One day he is contacted by a computer which, gradually reveals the machinery behind the reality we see. It is a bleak world. Together with his girlfriend Diana and Seldon, a Professor at Amsterdam Tech, he starts the quest to regain freedom….(More) (Individual chapters)”

Prediction and explanation in social systems


Jake M. HofmanAmit Sharma, and Duncan J. Watts in Science: “Historically, social scientists have sought out explanations of human and social phenomena that provide interpretable causal mechanisms, while often ignoring their predictive accuracy. We argue that the increasingly computational nature of social science is beginning to reverse this traditional bias against prediction; however, it has also highlighted three important issues that require resolution. First, current practices for evaluating predictions must be better standardized. Second, theoretical limits to predictive accuracy in complex social systems must be better characterized, thereby setting expectations for what can be predicted or explained. Third, predictive accuracy and interpretability must be recognized as complements, not substitutes, when evaluating explanations. Resolving these three issues will lead to better, more replicable, and more useful social science….(More)”