Ten simple rules for responsible big data research

Matthew Zook et al in PLOS Computational Biology: “The use of big data research methods has grown tremendously over the past five years in both academia and industry. As the size and complexity of available datasets has grown, so too have the ethical questions raised by big data research. These questions become increasingly urgent as data and research agendas move well beyond those typical of the computational and natural sciences, to more directly address sensitive aspects of human behavior, interaction, and health. The tools of big data research are increasingly woven into our daily lives, including mining digital medical records for scientific and economic insights, mapping relationships via social media, capturing individuals’ speech and action via sensors, tracking movement across space, shaping police and security policy via “predictive policing,” and much more.

The beneficial possibilities for big data in science and industry are tempered by new challenges facing researchers that often lie outside their training and comfort zone. Social scientists now grapple with data structures and cloud computing, while computer scientists must contend with human subject protocols and institutional review boards (IRBs). While the connection between individual datum and actual human beings can appear quite abstract, the scope, scale, and complexity of many forms of big data creates a rich ecosystem in which human participants and their communities are deeply embedded and susceptible to harm. This complexity challenges any normative set of rules and makes devising universal guidelines difficult.

Nevertheless, the need for direction in responsible big data research is evident, and this article provides a set of “ten simple rules” for addressing the complex ethical issues that will inevitably arise. Modeled on PLOS Computational Biology’s ongoing collection of rules, the recommendations we outline involve more nuance than the words “simple” and “rules” suggest. This nuance is inevitably tied to our paper’s starting premise: all big data research on social, medical, psychological, and economic phenomena engages with human subjects, and researchers have the ethical responsibility to minimize potential harm….

  1. Acknowledge that data are people and can do harm
  2. Recognize that privacy is more than a binary value
  3. Guard against the reidentification of your data
  4. Practice ethical data sharing
  5. Consider the strengths and limitations of your data; big does not automatically mean better
  6. Debate the tough, ethical choices
  7. Develop a code of conduct for your organization, research community, or industry
  8. Design your data and systems for auditability
  9. Engage with the broader consequences of data and analysis practices
  10. Know when to break these rules…(More)”