Ten simple rules for responsible big data research


Matthew Zook et al in PLOS Computational Biology: “The use of big data research methods has grown tremendously over the past five years in both academia and industry. As the size and complexity of available datasets has grown, so too have the ethical questions raised by big data research. These questions become increasingly urgent as data and research agendas move well beyond those typical of the computational and natural sciences, to more directly address sensitive aspects of human behavior, interaction, and health. The tools of big data research are increasingly woven into our daily lives, including mining digital medical records for scientific and economic insights, mapping relationships via social media, capturing individuals’ speech and action via sensors, tracking movement across space, shaping police and security policy via “predictive policing,” and much more.

The beneficial possibilities for big data in science and industry are tempered by new challenges facing researchers that often lie outside their training and comfort zone. Social scientists now grapple with data structures and cloud computing, while computer scientists must contend with human subject protocols and institutional review boards (IRBs). While the connection between individual datum and actual human beings can appear quite abstract, the scope, scale, and complexity of many forms of big data creates a rich ecosystem in which human participants and their communities are deeply embedded and susceptible to harm. This complexity challenges any normative set of rules and makes devising universal guidelines difficult.

Nevertheless, the need for direction in responsible big data research is evident, and this article provides a set of “ten simple rules” for addressing the complex ethical issues that will inevitably arise. Modeled on PLOS Computational Biology’s ongoing collection of rules, the recommendations we outline involve more nuance than the words “simple” and “rules” suggest. This nuance is inevitably tied to our paper’s starting premise: all big data research on social, medical, psychological, and economic phenomena engages with human subjects, and researchers have the ethical responsibility to minimize potential harm….

  1. Acknowledge that data are people and can do harm
  2. Recognize that privacy is more than a binary value
  3. Guard against the reidentification of your data
  4. Practice ethical data sharing
  5. Consider the strengths and limitations of your data; big does not automatically mean better
  6. Debate the tough, ethical choices
  7. Develop a code of conduct for your organization, research community, or industry
  8. Design your data and systems for auditability
  9. Engage with the broader consequences of data and analysis practices
  10. Know when to break these rules…(More)”

Innovations in Federal Statistics: Combining Data Sources While Protecting Privacy


Report by the National Academies of Sciences’s Panel on Improving Federal Statistics for Policy and Social Science: “Federal government statistics provide critical information to the country and serve a key role in a democracy. For decades, sample surveys with instruments carefully designed for particular data needs have been one of the primary methods for collecting data for federal statistics. However, the costs of conducting such surveys have been increasing while response rates have been declining, and many surveys are not able to fulfill growing demands for more timely information and for more detailed information at state and local levels.

Innovations in Federal Statistics examines the opportunities and risks of using government administrative and private sector data sources to foster a paradigm shift in federal statistical programs that would combine diverse data sources in a secure manner to enhance federal statistics. This first publication of a two-part series discusses the challenges faced by the federal statistical system and the foundational elements needed for a new paradigm….(More)”

Can social media, loud and inclusive, fix world politics


 at the Conversation: “Privacy is no longer a social norm, said Facebook founder Mark Zuckerberg in 2010, as social media took a leap to bring more private information into the public domain.

But what does it mean for governments, citizens and the exercise of democracy? Donald Trump is clearly not the first leader to use his Twitter account as a way to both proclaim his policies and influence the political climate. Social media presents novel challenges to strategic policy, and has become a managerial issues for many governments.

But it also offers a free platform for public participation in government affairs. Many argue that the rise of social media technologies can give citizens and observers a better opportunity to identify pitfalls of government and their politics.

As government embrace the role of social media and the influence of negative or positive feedback on the success of their project, they are also using this tool to their advantages by spreading fabricated news.

This much freedom of expression and opinion can be a double-edged sword.

A tool that triggers change

On the positive side, social media include social networking applications such as Facebook and Google+, microblogging services such as Twitter, blogs, video blogs (vlogs), wikis, and media-sharing sites such as YouTube and Flickr, among others.

Social media as a collaborative and participatory tool, connects users with each other and help shaping various communities. Playing a key role in delivering public service value to citizens it also helps people to engage in politics and policy-making, making processes easier to understand, through information and communication technologies (ICTs).

Today four out of five countries in the world have social media features on their national portals to promote interactive networking and communication with the citizen. Although we don’t have any information about the effectiveness of such tools or whether they are used to their full potential, 20% of these countries shows that they have “resulted in new policy decisions, regulation or service”.

Social media can be an effective tool to trigger changes in government policies and services if well used. It can be used to prevent corruption, as it is direct method of reaching citizens. In developing countries, corruption is often linked to governmental services that lack automated processes or transparency in payments.

The UK is taking the lead on this issue. Its anti-corruption innovation hub aims to connect several stakeholders – including civil society, law enforcement and technologies experts – to engage their efforts toward a more transparent society.

With social media, governments can improve and change the way they communicate with their citizens – and even question government projects and policies. In Kazakhstan, for example, a migration-related legislative amendment entered into force early January 2017 and compelled property owners to register people residing in their homes immediately or else face a penalty charge starting in February 2017.

Citizens were unprepared for this requirement, and many responded with indignation on social media. At first the government ignored this reaction. However, as the growing anger soared via social media, the government took action and introduced a new service to facilitate the registration of temporary citizens….

But the campaigns that result do not always evolve into positive change.

Egypt and Libya are still facing several major crises over the last years, along with political instability and domestic terrorism. The social media influence that triggered the Arab Spring did not permit these political systems to turn from autocracy to democracy.

Brazil exemplifies a government’s failure to react properly to a massive social media outburst. In June 2013 people took to the streets to protest the rising fares of public transportation. Citizens channelled their anger and outrage through social media to mobilise networks and generate support.

The Brazilian government didn’t understand that “the message is the people”. Though the riots some called the “Tropical Spring” disappeared rather abruptly in the months to come, they had major and devastating impact on Brazil’s political power, culminating in the impeachment of President Rousseff in late 2016 and the worst recession in Brazil’s history.

As in the Arab Spring countries, the use of social media in Brazil did not result in economic improvement. The country has tumbled down into depression, and unemployment has risen to 12.6%…..

Government typically asks “how can we adapt social media to the way in which we do e-services, and then try to shape their policies accordingly. They would be wiser to ask, “how can social media enable us to do things differently in a way they’ve never been done before?” – that is, policy-making in collaboration with people….(More)”.

The Conversation

Google DeepMind and healthcare in an age of algorithms


Julia Powles and Hal Hodson in Health and Technology: “Data-driven tools and techniques, particularly machine learning methods that underpin artificial intelligence, offer promise in improving healthcare systems and services. One of the companies aspiring to pioneer these advances is DeepMind Technologies Limited, a wholly-owned subsidiary of the Google conglomerate, Alphabet Inc. In 2016, DeepMind announced its first major health project: a collaboration with the Royal Free London NHS Foundation Trust, to assist in the management of acute kidney injury. Initially received with great enthusiasm, the collaboration has suffered from a lack of clarity and openness, with issues of privacy and power emerging as potent challenges as the project has unfolded. Taking the DeepMind-Royal Free case study as its pivot, this article draws a number of lessons on the transfer of population-derived datasets to large private prospectors, identifying critical questions for policy-makers, industry and individuals as healthcare moves into an algorithmic age….(More)”

Dark Web


Kristin Finklea for the Congressional Research Service: “The layers of the Internet go far beyond the surface content that many can easily access in their daily searches. The other content is that of the Deep Web, content that has not been indexed by traditional search engines such as Google. The furthest corners of the Deep Web, segments known as the Dark Web, contain content that has been intentionally concealed. The Dark Web may be used for legitimate purposes as well as to conceal criminal or otherwise malicious activities. It is the exploitation of the Dark Web for illegal practices that has garnered the interest of officials and policymakers.

Individuals can access the Dark Web by using special software such as Tor (short for The Onion Router). Tor relies upon a network of volunteer computers to route users’ web traffic through a series of other users’ computers such that the traffic cannot be traced to the original user. Some developers have created tools—such as Tor2web—that may allow individuals access to Torhosted content without downloading and installing the Tor software, though accessing the Dark Web through these means does not anonymize activity. Once on the Dark Web, users often navigate it through directories such as the “Hidden Wiki,” which organizes sites by category, similar to Wikipedia. Individuals can also search the Dark Web with search engines, which may be broad, searching across the Deep Web, or more specific, searching for contraband like illicit drugs, guns, or counterfeit money.

While on the Dark Web, individuals may communicate through means such as secure email, web chats, or personal messaging hosted on Tor. Though tools such as Tor aim to anonymize content and activity, researchers and security experts are constantly developing means by which certain hidden services or individuals could be identified or “deanonymized.” Anonymizing services such as Tor have been used for legal and illegal activities ranging from maintaining privacy to selling illegal goods—mainly purchased with Bitcoin or other digital currencies. They may be used to circumvent censorship, access blocked content, or maintain the privacy of sensitive communications or business plans. However, a range of malicious actors, from criminals to terrorists to state-sponsored spies, can also leverage cyberspace and the Dark Web can serve as a forum for conversation, coordination, and action. It is unclear how much of the Dark Web is dedicated to serving a particular illicit market at any one time, and, because of the anonymity of services such as Tor, it is even further unclear how much traffic is actually flowing to any given site.

Just as criminals can rely upon the anonymity of the Dark Web, so too can the law enforcement, military, and intelligence communities. They may, for example, use it to conduct online surveillance and sting operations and to maintain anonymous tip lines. Anonymity in the Dark Web can be used to shield officials from identification and hacking by adversaries. It can also be used to conduct a clandestine or covert computer network operation such as taking down a website or a denial of service attack, or to intercept communications. Reportedly, officials are continuously working on expanding techniques to deanonymize activity on the Dark Web and identify malicious actors online….(More)”

The Crowd & the Cloud


The Crowd & the Cloud (TV series): “Are you interested in birds, fish, the oceans or streams in your community? Are you concerned about fracking, air quality, extreme weather, asthma, Alzheimer’s disease, Zika or other epidemics? Now you can do more than read about these issues. You can be part of the solution.

Smartphones, computers and mobile technology are enabling regular citizens to become part of a 21st century way of doing science. By observing their environments, monitoring neighborhoods, collecting information about the world and the things they care about, so-called “citizen scientists” are helping professional scientists to advance knowledge while speeding up new discoveries and innovations.

The results are improving health and welfare, assisting in wildlife conservation, and giving communities the power to create needed change and help themselves.

Citizen science has amazing promise, but also raises questions about data quality and privacy. Its potential and challenges are explored in THE CROWD & THE CLOUD, a 4-part public television series premiering in April 2017. Hosted by former NASA Chief Scientist Waleed Abdalati, each episode takes viewers on a global tour of the projects and people on the front lines of this disruptive transformation in how science is done, and shows how anyone, anywhere can participate….(More)”

 

Migration tracking is a mess


Huub Dijstelbloem in Nature: “As debates over migration, refugees and freedom of movement intensify, technologies are increasingly monitoring the movements of people. Biometric passports and databases containing iris scans or fingerprints are being used to check a person’s right to travel through or stay within a territory. India, for example, is establishing biometric identification for its 1.3 billion citizens.

But technologies are spreading beyond borders. Security policies and humanitarian considerations have broadened the landscape. Drones and satellite images inform policies and direct aid to refugees. For instance, the United Nations Institute for Training and Research (UNITAR), maps refugee camps in Jordan and elsewhere with its Operational Satellite Applications Programme (UNOSAT; see www.unitar.org/unosat/map/1928).

Three areas are in need of research, in my view: the difficulties of joining up disparate monitoring systems; privacy issues and concerns over the inviolability of the human body; and ‘counter-surveillance’ deployed by non-state actors to highlight emergencies or contest claims that governments make.

Ideally, state monitoring of human mobility would be bound by ethical principles, solid legislation, periodical evaluations and the checks and balances of experts and political and public debates. In reality, it is ad hoc. Responses are arbitrary, fuelled by the crisis management of governments that have failed to anticipate global and regional migration patterns. Too often, this results in what the late sociologist Ulrich Beck called organized irresponsibility: situations of inadequacy in which it is hard to blame a single actor.

Non-governmental organizations, activists and migrant groups are using technologies to register incidents and to blame and shame states. For example, the Forensic Architecture research agency at Goldsmiths, University of London, has used satellite imagery and other evidence to reconstruct the journey of a boat that left Tripoli on 27 March 2011 with 72 passengers. A fortnight later, it returned to the Libyan coast with just 9 survivors. Although the boat had been spotted by several aircraft and vessels, no rescue operation had been mounted (go.nature.com/2mbwvxi). Whether the states involved can be held accountable is still being considered.

In the end, technologies to monitor mobility are political tools. Their aims, design, use, costs and consequences should be developed and evaluated accordingly….(More)”.

Bit By Bit: Social Research in the Digital Age


Open Review of Book by Matthew J. Salganik: “In the summer of 2009, mobile phones were ringing all across Rwanda. In addition to the millions of calls between family, friends, and business associates, about 1,000 Rwandans received a call from Joshua Blumenstock and his colleagues. The researchers were studying wealth and poverty by conducting a survey of people who had been randomly sampled from a database of 1.5 million customers from Rwanda’s largest mobile phone provider. Blumenstock and colleagues asked the participants if they wanted to participate in a survey, explained the nature of the research to them, and then asked a series of questions about their demographic, social, and economic characteristics.

Everything I have said up until now makes this sound like a traditional social science survey. But, what comes next is not traditional, at least not yet. They used the survey data to train a machine learning model to predict someone’s wealth from their call data, and then they used this model to estimate the wealth of all 1.5 million customers. Next, they estimated the place of residence of all 1.5 million customers by using the geographic information embedded in the call logs. Putting these two estimates together—the estimated wealth and the estimated place of residence—Blumenstock and colleagues were able to produce high-resolution estimates of the geographic distribution of wealth across Rwanda. In particular, they could produce an estimated wealth for each of Rwanda’s 2,148 cells, the smallest administrative unit in the country.

It was impossible to validate these estimates because no one had ever produced estimates for such small geographic areas in Rwanda. But, when Blumenstock and colleagues aggregated their estimates to Rwanda’s 30 districts, they found that their estimates were similar to estimates from the Demographic and Health Survey, the gold standard of surveys in developing countries. Although these two approaches produced similar estimates in this case, the approach of Blumenstock and colleagues was about 10 times faster and 50 times cheaper than the traditional Demographic and Health Surveys. These dramatically faster and lower cost estimates create new possibilities for researchers, governments, and companies (Blumenstock, Cadamuro, and On 2015).

In addition to developing a new methodology, this study is kind of like a Rorschach inkblot test; what people see depends on their background. Many social scientists see a new measurement tool that can be used to test theories about economic development. Many data scientists see a cool new machine learning problem. Many business people see a powerful approach for unlocking value in the digital trace data that they have already collected. Many privacy advocates see a scary reminder that we live in a time of mass surveillance. Many policy makers see a way that new technology can help create a better world. In fact, this study is all of those things, and that is why it is a window into the future of social research….(More)”

Watchdog to launch inquiry into misuse of data in politics


, and Alice Gibbs in The Guardian: “The UK’s privacy watchdog is launching an inquiry into how voters’ personal data is being captured and exploited in political campaigns, cited as a key factor in both the Brexit and Trump victories last year.

The intervention by the Information Commissioner’s Office (ICO) follows revelations in last week’s Observer that a technology company part-owned by a US billionaire played a key role in the campaign to persuade Britons to vote to leave the European Union.

It comes as privacy campaigners, lawyers, politicians and technology experts express fears that electoral laws are not keeping up with the pace of technological change.

“We are conducting a wide assessment of the data-protection risks arising from the use of data analytics, including for political purposes, and will be contacting a range of organisations,” an ICO spokeswoman confirmed. “We intend to publicise our findings later this year.”

The ICO spokeswoman confirmed that it had approached Cambridge Analytica over its apparent use of data following the story in the Observer. “We have concerns about Cambridge Analytica’s reported use of personal data and we are in contact with the organisation,” she said….

In the US, companies are free to use third-party data without seeking consent. But Gavin Millar QC, of Matrix Chambers, said this was not the case in Europe. “The position in law is exactly the same as when people would go canvassing from door to door,” Millar said. “They have to say who they are, and if you don’t want to talk to them you can shut the door in their face.That’s the same principle behind the data protection act. It’s why if telephone canvassers ring you, they have to say that whole long speech. You have to identify yourself explicitly.”…

Dr Simon Moores, visiting lecturer in the applied sciences and computing department at Canterbury Christ Church University and a technology ambassador under the Blair government, said the ICO’s decision to shine a light on the use of big data in politics was timely.

“A rapid convergence in the data mining, algorithmic and granular analytics capabilities of companies like Cambridge Analytica and Facebook is creating powerful, unregulated and opaque ‘intelligence platforms’. In turn, these can have enormous influence to affect what we learn, how we feel, and how we vote. The algorithms they may produce are frequently hidden from scrutiny and we see only the results of any insights they might choose to publish.” …(More)”

Open Data Privacy Playbook


A data privacy playbook by Ben Green, Gabe Cunningham, Ariel Ekblaw, Paul Kominers, Andrew Linzer, and Susan Crawford: “Cities today collect and store a wide range of data that may contain sensitive or identifiable information about residents. As cities embrace open data initiatives, more of this information is available to the public. While releasing data has many important benefits, sharing data comes with inherent risks to individual privacy: released data can reveal information about individuals that would otherwise not be public knowledge. In recent years, open data such as taxi trips, voter registration files, and police records have revealed information that many believe should not be released.

Effective data governance is a prerequisite for successful open data programs. The goal of this document is to codify responsible privacy-protective approaches and processes that could be adopted by cities and other government organizations that are publicly releasing data. Our report is organized around four recommendations:

  • Conduct risk-benefit analyses to inform the design and implementation of open data programs.
  • Consider privacy at each stage of the data lifecycle: collect, maintain, release, delete.
  • Develop operational structures and processes that codify privacy management widely throughout the City.
  • Emphasize public engagement and public priorities as essential aspects of data management programs.

Each chapter of this report is dedicated to one of these four recommendations, and provides fundamental context along with specific suggestions to carry them out. In particular, we provide case studies of best practices from numerous cities and a set of forms and tactics for cities to implement our recommendations. The Appendix synthesizes key elements of the report into an Open Data Privacy Toolkit that cities can use to manage privacy when releasing data….(More)”