The Moral Failure of Computer Scientists

Kaveh Waddell at the Atlantic: “Computer scientists and cryptographers occupy some of the ivory tower’s highest floors. Among academics, their work is prestigious and celebrated. To the average observer, much of it is too technical to comprehend. The field’s problems can sometimes seem remote from reality.

But computer science has quite a bit to do with reality. Its practitioners devise the surveillance systems that watch over nearly every space, public or otherwise—and they design the tools that allow for privacy in the digital realm. Computer science is political, by its very nature.

That’s at least according to Phillip Rogaway, a professor of computer science at the University of California, Davis, who has helped create some of the most important tools that secure the Internet today. Last week, Rogaway took his case directly to a roomful of cryptographers at a conference in Auckland, New Zealand. He accused them of a moral failure: By allowing the government to construct a massive surveillance apparatus, the field had abused the public trust. Rogaway said the scientists had a duty to pursue social good in their work.
He likened the danger posed by modern governments’ growing surveillance capabilities to the threat of nuclear warfare in the 1950s, and called upon scientists to step up and speak out today, as they did then.

I spoke to Rogaway about why cryptographers fail to see their work in moral terms, and the emerging link between encryption and terrorism in the national conversation. A transcript of our conversation appears below, lightly edited for concision and clarity….(More)”

Opening up government data for public benefit

Keiran Hardy at the Mandarin (Australia): “…This post explains the open data movement and considers the benefits and risks of releasing government data as open data. It then outlines the steps taken by the Labor and Liberal governments in accordance with this trend. It argues that the Prime Minister’stask, while admirably intentioned, is likely to prove difficult due to ongoing challenges surrounding the requirements of privacy law and a public service culture that remains reluctant to release government data into the public domain….

A key purpose of releasing government data is to improve the effectiveness and efficiency of services delivered by the government. For example, data on crops, weather and geography might be analysed to improve current approaches to farming and industry, or data on hospital admissions might be analysed alongside demographic and census data to improve the efficiency of health services in areas of need. It has been estimated that such innovation based on open data could benefit the Australian economy by up to $16 billion per year.

Another core benefit is that the open data movement is making gains in transparency and accountability, as a greater proportion of government decisions and operations are being shared with the public. These democratic values are made clear in the OGP’s Open Government Declaration, which aims to make governments ‘more open, accountable, and responsive to citizens’.

Open data can also improve democratic participation by allowing citizens to contribute to policy innovation. Events like GovHack, an annual Australian competition in which government, industry and the general public collaborate to find new uses for open government data, epitomise a growing trend towards service delivery informed by user input. The winner of the “Best Policy Insights Hack” at GovHack 2015 developed a software program for analysing which suburbs are best placed for rooftop solar investment.

At the same time, the release of government data poses significant risks to the privacy of Australian citizens. Much of the open data currently available is spatial (geographic or satellite) data, which is relatively unproblematic to post online as it poses minimal privacy risks. However, for the full benefits of open data to be gained, these kinds of data need to be supplemented with information on welfare payments, hospital admission rates and other potentially sensitive areas which could drive policy innovation.

Policy data in these areas would be de-identified — that is, all names, addresses and other obvious identifying information would be removed so that only aggregate or statistical data remains. However, debates continue as to the reliability of de-identification techniques, as there have been prominent examples of individuals being re-identified by cross-referencing datasets….

With regard to open data, a culture resistant to releasing government informationappears to be driven by several similar factors, including:

  • A generational preference amongst public service management for maintaining secrecy of information, whereas younger generations expect that data should be made freely available;
  • Concerns about the quality or accuracy of information being released;
  • Fear that mistakes or misconduct on behalf of government employees might be exposed;
  • Limited understanding of the benefits that can be gained from open data; and
  • A lack of leadership to help drive the open data movement.

If open data policies have a similar effect on public service culture as FOI legislation, it may be that open data policies in fact hinder transparency by having a chilling effect on government decision-making for fear of what might be exposed….

These legal and cultural hurdles will pose ongoing challenges for the Turnbull government in seeking to release greater amounts of government data as open data….(More)

What Privacy Papers Should Policymakers be Reading in 2016?

Stacy Gray at the Future of Privacy Forum: “Each year, FPF invites privacy scholars and authors to submit articles and papers to be considered by members of our Advisory Board, with an aim toward showcasing those articles that should inform any conversation about privacy among policymakers in Congress, as well as at the Federal Trade Commission and in other government agencies. For our sixth annual Privacy Papers for Policymakers, we received submissions on topics ranging from mobile app privacy, to location tracking, to drone policy.

Our Advisory Board selected papers that describe the challenges and best practices of designing privacy notices, ways to minimize the risks of re-identification of data by focusing on process-based data release policy and taking a precautionary approach to data release, the relationship between privacy and markets, and bringing the concept of trust more strongly into privacy principles.

Open Data, Privacy, and Fair Information Principles: Towards a Balancing Framework

Paper by Zuiderveen Borgesius, Frederik J. and van Eechoud, Mireille and Gray, Jonathan: “Open data are held to contribute to a wide variety of social and political goals, including strengthening transparency, public participation and democratic accountability, promoting economic growth and innovation, and enabling greater public sector efficiency and cost savings. However, releasing government data that contain personal information may threaten privacy and related rights and interests. In this paper we ask how these privacy interests can be respected, without unduly hampering benefits from disclosing public sector information. We propose a balancing framework to help public authorities address this question in different contexts. The framework takes into account different levels of privacy risks for different types of data. It also separates decisions about access and re-use, and highlights a range of different disclosure routes. A circumstance catalogue lists factors that might be considered when assessing whether, under which conditions, and how a dataset can be released. While open data remains an important route for the publication of government information, we conclude that it is not the only route, and there must be clear and robust public interest arguments in order to justify the disclosure of personal information as open data….(More)

Decoding the Future for National Security

George I. Seffers at Signal: “U.S. intelligence agencies are in the business of predicting the future, but no one has systematically evaluated the accuracy of those predictions—until now. The intelligence community’s cutting-edge research and development agency uses a handful of predictive analytics programs to measure and improve the ability to forecast major events, including political upheavals, disease outbreaks, insider threats and cyber attacks.

The Office for Anticipating Surprise at the Intelligence Advanced Research Projects Activity (IARPA) is a place where crystal balls come in the form of software, tournaments and throngs of people. The office sponsors eight programs designed to improve predictive analytics, which uses a variety of data to forecast events. The programs all focus on incidents outside of the United States, and the information is anonymized to protect privacy. The programs are in different stages, some having recently ended as others are preparing to award contracts.

But they all have one more thing in common: They use tournaments to advance the state of the predictive analytic arts. “We decided to run a series of forecasting tournaments in which people from around the world generate forecasts about, now, thousands of real-world events,” says Jason Matheny, IARPA’s new director. “All of our programs on predictive analytics do use this tournament style of funding and evaluating research.” The Open Source Indicators program used a crowdsourcing technique in which people across the globe offered their predictions on such events as political uprisings, disease outbreaks and elections.

The data analyzed included social media trends, Web search queries and even cancelled dinner reservations—an indication that people are sick. “The methods applied to this were all automated. They used machine learning to comb through billions of pieces of data to look for that signal, that leading indicator, that an event was about to happen,” Matheny explains. “And they made amazing progress. They were able to predict disease outbreaks weeks earlier than traditional reporting.” The recently completed Aggregative Contingent Estimation (ACE) program also used a crowdsourcing competition in which people predicted events, including whether weapons would be tested, treaties would be signed or armed conflict would break out along certain borders. Volunteers were asked to provide information about their own background and what sources they used. IARPA also tested participants’ cognitive reasoning abilities. Volunteers provided their forecasts every day, and IARPA personnel kept score. Interestingly, they discovered the “deep domain” experts were not the best at predicting events. Instead, people with a certain style of thinking came out the winners. “They read a lot, not just from one source, but from multiple sources that come from different viewpoints. They have different sources of data, and they revise their judgments when presented with new information. They don’t stick to their guns,” Matheny reveals. …

The ACE research also contributed to a recently released book, Superforecasting: The Art and Science of Prediction, according to the IARPA director. The book was co-authored, along with Dan Gardner, by Philip Tetlock, the Annenberg University professor of psychology and management at the University of Pennsylvania who also served as a principal investigator for the ACE program. Like ACE, the Crowdsourcing Evidence, Argumentation, Thinking and Evaluation program uses the forecasting tournament format, but it also requires participants to explain and defend their reasoning. The initiative aims to improve analytic thinking by combining structured reasoning techniques with crowdsourcing.

Meanwhile, the Foresight and Understanding from Scientific Exposition (FUSE) program forecasts science and technology breakthroughs….(More)”

Sharing Information

An Ericsson Consumer Insight Summary Report: “In the age of the internet we often hear how companies, authorities and other organizations get access to our personal information. As a result, the topic of privacy is frequently debated. What is sometimes overlooked is how we as individuals watch in return. We observe not only each other, but also companies and authorities – and we share what we see. Your neighbor searches the net about the family that just moved in next door. The traveler films his hotel and shares the video with other potential holidaymakers. A friend shares her experience about her employer on a social network. As sharing online continues to grow, we are starting to see the impact on both our individual lives and society. In this report we begin to uncover how consumers perceive their influence – but also some issues that arise as a result….

By sharing more information than ever, smartphone owners are increasingly acting like citizen journalists > Over 70 percent of all smartphone users share personal photos regularly. 69 percent share more than they did 2 years ago

> 69 percent also read or watch other people’s shared content more than they did 2 years ago

People report wrongdoings by businesses and authorities online

> 34 percent of smartphone owners who have had bad experiences with companies say they usually share their experiences online. 27 percent repost other consumers’ complaints on a weekly basis

> Over half of smartphone users surveyed believe that being able to express opinions online about companies has increased their influence

Consumers expect shared information to have an effect on society and the world

> 54 percent believe that the internet has increased the possibility for whistleblowers to expose corrupt and illicit behavior in companies and organizations

> Furthermore, 37 percent of smartphone users believe that sharing information about a corrupt company online has greater impact than going to the police

With new power comes new challenges

> 46 percent of smartphone users would like a verification service to check the authenticity of an online posting or news clip

> 64 percent would like to be able to stop negative information about themselves circulating online

> 1 in 2 says protecting personal information should be a priority on the political agenda, although only 1 in 4 says it is not”…(More)”

Meeting the Challenges of Big Data

Opinion by the European Data Protection Supervisor: “Big data, if done responsibly, can deliver significant benefits and efficiencies for society and individuals not only in health, scientific research, the environment and other specific areas. But there are serious concerns with the actual and potential impact of processing of huge amounts of data on the rights and freedoms of individuals, including their right to privacy. The challenges and risks of big data therefore call for more effective data protection.

Technology should not dictate our values and rights, but neither should promoting innovation and preserving fundamental rights be perceived as incompatible. New business models exploiting new capabilities for the massive collection, instantaneous transmission, combination and reuse of personal information for unforeseen purposes have placed the principles of data protection under new strains, which calls for thorough consideration on how they are applied.

European data protection law has been developed to protect our fundamental rights and values, including our right to privacy. The question is not whether to apply data protection law to big data, but rather how to apply it innovatively in new environments. Our current data protection principles, including transparency, proportionality and purpose limitation, provide the base line we will need to protect more dynamically our fundamental rights in the world of big data. They must, however, be complemented by ‘new’ principles which have developed over the years such as accountability and privacy by design and by default. The EU data protection reform package is expected to strengthen and modernise the regulatory framework .

The EU intends to maximise growth and competitiveness by exploiting big data. But the Digital Single Market cannot uncritically import the data-driven technologies and business models which have become economic mainstream in other areas of the world. Instead it needs to show leadership in developing accountable personal data processing. The internet has evolved in a way that surveillance – tracking people’s behaviour – is considered as the indispensable revenue model for some of the most successful companies. This development calls for critical assessment and search for other options.

In any event, and irrespective of the business models chosen, organisations that process large volumes of personal information must comply with applicable data protection law. The European Data Protection Supervisor (EDPS) believes that responsible and sustainable development of big data must rely on four essential elements:

  • organisations must be much more transparent about how they process personal data;
  • afford users a higher degree of control over how their data is used;
  • design user friendly data protection into their products and services; and;
  • become more accountable for what they do….(More)

Big Data and Privacy: Emerging Issues

O’Leary, Daniel E. at Intelligent Systems, IEEE : “The goals of big data and privacy are fundamentally opposed to each other. Big data and knowledge discovery are aimed reducing information asymmetries between organizations and the data sources, whereas privacy is aimed at maintaining information asymmetries of data sources. A number of different definitions of privacy are used to investigate some of the tensions between different characteristics of big data and potential privacy concerns. Specifically, the author examines the consequences of unevenness in big data, digital data going from local controlled settings to uncontrolled global settings, privacy effects of reputation monitoring systems, and inferring knowledge from social media. In addition, the author briefly analyzes two other emerging sources of big data: police cameras and stingray for location information….(More)”

Analyzing 1.1 Billion NYC Taxi and Uber Trips

Todd W. Schneider: “The New York City Taxi & Limousine Commission has released a staggeringly detailed historical dataset covering over 1.1 billion individual taxi trips in the city from January 2009 through June 2015. Taken as a whole, the detailed trip-level data is more than just a vast list of taxi pickup and drop off coordinates: it’s a story of New York. How bad is the rush hour traffic from Midtown to JFK? Where does the Bridge and Tunnel crowd hang out on Saturday nights? What time do investment bankers get to work? How has Uber changed the landscape for taxis? And could Bruce Willis and Samuel L. Jackson have made it from 72nd and Broadway to Wall Street in less than 30 minutes? The dataset addresses all of these questions and many more.

I mapped the coordinates of every trip to local census tracts and neighborhoods, then set about in an attempt to extract stories and meaning from the data. This post covers a lot, but for those who want to pursue more analysis on their own: everything in this post—the data, software, and code—is freely available. Full instructions to download and analyze the data for yourself are available on GitHub.

Table of Contents

  1. Maps
  2. The Data
  3. Borough Trends, and the Rise of Uber
  4. Airport Traffic
  5. On the Realism of Die Hard 3
  6. How Does Weather Affect Taxi and Uber Ridership?
  7. NYC Late Night Taxi Index
  8. The Bridge and Tunnel Crowd
  9. Northside Williamsburg
  10. Privacy Concerns
  11. Investment Bankers
  12. Parting Thoughts…(More)

Open government data: Out of the box

The Economist on “The open-data revolution has not lived up to expectations. But it is only getting started…

The app that helped save Mr Rich’s leg is one of many that incorporate government data—in this case, supplied by four health agencies. Six years ago America became the first country to make all data collected by its government “open by default”, except for personal information and that related to national security. Almost 200,000 datasets from 170 outfits have been posted on the website. Nearly 70 other countries have also made their data available: mostly rich, well-governed ones, but also a few that are not, such as India (see chart). The Open Knowledge Foundation, a London-based group, reckons that over 1m datasets have been published on open-data portals using its CKAN software, developed in 2010.