New York City moves to create accountability for algorithms


Lauren Kirchner at ArsTechnica: “The algorithms that play increasingly central roles in our lives often emanate from Silicon Valley, but the effort to hold them accountable may have another epicenter: New York City. Last week, the New York City Council unanimously passed a bill to tackle algorithmic discrimination—the first measure of its kind in the country.

The algorithmic accountability bill, waiting to be signed into law by Mayor Bill de Blasio, establishes a task force that will study how city agencies use algorithms to make decisions that affect New Yorkers’ lives, and whether any of the systems appear to discriminate against people based on age, race, religion, gender, sexual orientation, or citizenship status. The task force’s report will also explore how to make these decision-making processes understandable to the public.

The bill’s sponsor, Council Member James Vacca, said he was inspired by ProPublica’s investigation into racially biased algorithms used to assess the criminal risk of defendants….

A previous, more sweeping version of the bill had mandated that city agencies publish the source code of all algorithms being used for “targeting services” or “imposing penalties upon persons or policing” and to make them available for “self-testing” by the public. At a hearing at City Hall in October, representatives from the mayor’s office expressed concerns that this mandate would threaten New Yorkers’ privacy and the government’s cybersecurity.

The bill was one of two moves the City Council made last week concerning algorithms. On Thursday, the committees on health and public safety held a hearing on the city’s forensic methods, including controversial tools that the chief medical examiner’s office crime lab has used for difficult-to-analyze samples of DNA.

As a ProPublica/New York Times investigation detailed in September, an algorithm created by the lab for complex DNA samples has been called into question by scientific experts and former crime lab employees.

The software, called the Forensic Statistical Tool, or FST, has never been adopted by any other lab in the country….(More)”.

Normative Challenges of Identification in the Internet of Things: Privacy, Profiling, Discrimination, and the GDPR


Paper by Sandra Wachter: “In the Internet of Things (IoT), identification and access control technologies provide essential infrastructure to link data between a user’s devices with unique identities, and provide seamless and linked up services. At the same time, profiling methods based on linked records can reveal unexpected details about users’ identity and private life, which can conflict with privacy rights and lead to economic, social, and other forms of discriminatory treatment. A balance must be struck between identification and access control required for the IoT to function and user rights to privacy and identity. Striking this balance is not an easy task because of weaknesses in cybersecurity and anonymisation techniques.

The EU General Data Protection Regulation (GDPR), set to come into force in May 2018, may provide essential guidance to achieve a fair balance between the interests of IoT providers and users. Through a review of academic and policy literature, this paper maps the inherit tension between privacy and identifiability in the IoT.

It focuses on four challenges: (1) profiling, inference, and discrimination; (2) control and context-sensitive sharing of identity; (3) consent and uncertainty; and (4) honesty, trust, and transparency. The paper will then examine the extent to which several standards defined in the GDPR will provide meaningful protection for privacy and control over identity for users of IoT. The paper concludes that in order to minimise the privacy impact of the conflicts between data protection principles and identification in the IoT, GDPR standards urgently require further specification and implementation into the design and deployment of IoT technologies….(More)”.

Research reveals de-identified patient data can be re-identified


Vanessa Teague, Chris Culnane and Ben Rubinstein in PhysOrg: “In August 2016, Australia’s federal Department of Health published medical billing records of about 2.9 million Australians online. These records came from the Medicare Benefits Scheme (MBS) and the Pharmaceutical Benefits Scheme (PBS) containing 1 billion lines of historical health data from the records of around 10 per cent of the population.

These longitudinal records were de-identified, a process intended to prevent a person’s identity from being connected with information, and were made public on the government’s open data website as part of its policy on accessible public 

We found that patients can be re-identified, without decryption, through a process of linking the unencrypted parts of the  with known information about the individual.

Our findings replicate those of similar studies of other de-identified datasets:

  • A few mundane facts taken together often suffice to isolate an individual.
  • Some patients can be identified by name from publicly available information.
  • Decreasing the precision of the data, or perturbing it statistically, makes re-identification gradually harder at a substantial cost to utility.

The first step is examining a patient’s uniqueness according to medical procedures such as childbirth. Some individuals are unique given public information, and many patients are unique given a few basic facts, such as year of birth or the date a baby was delivered….

The second step is examining uniqueness according to the characteristics of commercial datasets we know of but cannot access directly. There are high uniqueness rates that would allow linking with a commercial pharmaceutical dataset, and with the billing data available to a bank. This means that ordinary people, not just the prominent ones, may be easily re-identifiable by their bank or insurance company…

These de-identification methods were bound to fail, because they were trying to achieve two inconsistent aims: the protection of individual privacy and publication of detailed individual records. De-identification is very unlikely to work for other rich datasets in the government’s care, like census data, tax records, mental health records, penal information and Centrelink data.

While the ambition of making more data more easily available to facilitate research, innovation and sound public policy is a good one, there is an important technical and procedural problem to solve: there is no good solution for publishing sensitive complex individual records that protects privacy without substantially degrading the usefulness of the data.

Some data can be safely published online, such as information about government, aggregations of large collections of material, or data that is differentially private. For sensitive, complex data about individuals, a much more controlled release in a secure research environment is a better solution. The Productivity Commission recommends a “trusted user” model, and techniques like dynamic consent also give patients greater control and visibility over their personal information….(More).

Accelerating the Sharing of Data Across Sectors to Advance the Common Good


Paper by Robert M. Groves and Adam Neufeld: “The public pays for and provides an incredible amount of data to governments and companies. Yet much of the value of this data is being wasted, remaining in silos rather than being shared to enhance the common good—whether it’s helping governments to stop opioid addiction or helping companies predict and meet the demand for electric or autonomous vehicles.

  • Many companies and governments are interested in sharing more of their data with each other; however, right now the process of sharing is very time consuming and can pose great risks since it often involves sharing full data sets with another entity
  • We need intermediaries to design safe environments to facilitate data sharing in the low-trust and politically sensitive context of companies and governments. These safe environments would exist outside the government, be transparent to the public, and use modern technologies and techniques to allow only statistical uses of data through temporary linkages in order to minimize the risk to individuals’ privacy.
  • Governments must lead the way in sharing more data by re-evaluating laws that limit sharing of data, and must embrace new technologies that could allow the private sector to receive at least some value from many sensitive data sets. By decreasing the cost and risks of sharing data, more data will be freed from their silos, and we will move closer to what we deserve—that our data are used for the greatest societal benefit….(More)”.

Sharing is Daring: An Experiment on Consent, Chilling Effects and a Salient Privacy Nudge


Hermstrüwer, Yoan and Dickert, Stephan at the International Review of Law and Economics: “Privacy law rests on the assumption that government surveillance may increase the general level of conformity and thus generate a chilling effect. In a study that combines elements of a lab and a field experiment, we show that salient and incentivized consent options are sufficient to trigger this behavioral effect. Salient ex ante consent options may lure people into giving up their privacy and increase their compliance with social norms – even when the only immediate risk of sharing information is mere publicity on a Google website. A right to be forgotten (right to deletion), however, seems to reduce neither privacy valuations nor chilling effects. In spite of low deletion costs people tend to stick with a retention default. The study suggests that consent architectures may play out on social conformity rather than on consent choices and privacy valuations. Salient notice and consent options may not merely empower users to make an informed consent decision. Instead, they can trigger the very effects that privacy law intends to curb….(More)”.

Transatlantic Data Privacy


Paul M. Schwartz and Karl-Nikolaus Peifer in Georgetown Law Journal: “International flows of personal information are more significant than ever, but differences in transatlantic data privacy law imperil this data trade. The resulting policy debate has led the EU to set strict limits on transfers of personal data to any non-EU country—including the United States—that lacks sufficient privacy protections. Bridging the transatlantic data divide is therefore a matter of the greatest significance.

In exploring this issue, this Article analyzes the respective legal identities constructed around data privacy in the EU and the United States. It identifies profound differences in the two systems’ images of the individual as bearer of legal interests. The EU has created a privacy culture around “rights talk” that protects its “datasubjects.” In the EU, moreover, rights talk forms a critical part of the postwar European project of creating the identity of a European citizen. In the United States, in contrast, the focus is on a “marketplace discourse” about personal information and the safeguarding of “privacy consumers.” In the United States, data privacy law focuses on protecting consumers in a data marketplace.

This Article uses its models of rights talk and marketplace discourse to analyze how the EU and United States protect their respective data subjects and privacy consumers. Although the differences are great, there is still a path forward. A new set of institutions and processes can play a central role in developing mutually acceptable standards of data privacy. The key documents in this regard are the General Data Protection Regulation, an EU-wide standard that becomes binding in 2018, and the Privacy Shield, an EU–U.S. treaty signed in 2016. These legal standards require regular interactions between the EU and United States and create numerous points for harmonization, coordination, and cooperation. The GDPR and Privacy Shield also establish new kinds of governmental networks to resolve conflicts. The future of international data privacy law rests on the development of new understandings of privacy within these innovative structures….(More)”.

Understanding Corporate Data Sharing Decisions: Practices, Challenges, and Opportunities for Sharing Corporate Data with Researchers


Leslie Harris at the Future of Privacy Forum: “Data has become the currency of the modern economy. A recent study projects the global volume of data to grow from about 0.8 zettabytes (ZB) in 2009 to more than 35 ZB in 2020, most of it generated within the last two years and held by the corporate sector.

As the cost of data collection and storage becomes cheaper and computing power increases, so does the value of data to the corporate bottom line. Powerful data science techniques, including machine learning and deep learning, make it possible to search, extract and analyze enormous sets of data from many sources in order to uncover novel insights and engage in predictive analysis. Breakthrough computational techniques allow complex analysis of encrypted data, making it possible for researchers to protect individual privacy, while extracting valuable insights.

At the same time, these newfound data sources hold significant promise for advancing scholarship and shaping more impactful social policies, supporting evidence-based policymaking and more robust government statistics, and shaping more impactful social interventions. But because most of this data is held by the private sector, it is rarely available for these purposes, posing what many have argued is a serious impediment to scientific progress.

A variety of reasons have been posited for the reluctance of the corporate sector to share data for academic research. Some have suggested that the private sector doesn’t realize the value of their data for broader social and scientific advancement. Others suggest that companies have no “chief mission” or public obligation to share. But most observers describe the challenge as complex and multifaceted. Companies face a variety of commercial, legal, ethical, and reputational risks that serve as disincentives to sharing data for academic research, with privacy – particularly the risk of reidentification – an intractable concern. For companies, striking the right balance between the commercial and societal value of their data, the privacy interests of their customers, and the interests of academics presents a formidable dilemma.

To be sure, there is evidence that some companies are beginning to share for academic research. For example, a number of pharmaceutical companies are now sharing clinical trial data with researchers, and a number of individual companies have taken steps to make data available as well. What is more, companies are also increasingly providing open or shared data for other important “public good” activities, including international development, humanitarian assistance and better public decision-making. Some are contributing to data collaboratives that pool data from different sources to address societal concerns. Yet, it is still not clear whether and to what extent this “new era of data openness” will accelerate data sharing for academic research.

Today, the Future of Privacy Forum released a new study, Understanding Corporate Data Sharing Decisions: Practices, Challenges, and Opportunities for Sharing Corporate Data with ResearchersIn this report, we aim to contribute to the literature by seeking the “ground truth” from the corporate sector about the challenges they encounter when they consider making data available for academic research. We hope that the impressions and insights gained from this first look at the issue will help formulate further research questions, inform the dialogue between key stakeholders, and identify constructive next steps and areas for further action and investment….(More)”.

Ethical questions in data journalism and the power of online discussion


David Craig, Stan Ketterer and Mohammad Yousuf at Data Driven Journalism: “One common element uniting data journalism projects, across different stories and locations, is the ethical challenges they present.

As scholars and practitioners of data journalism have pointed out, main issues include flawed datamisrepresentation from a lack of context, and privacy concerns. Contributors have discussed the ethics of data journalism on this site in posts about topics such as the use of pervasive datatransparency about editorial processes in computational journalism, and best practices for doing data journalism ethically.

Our research project looked at similar ethical challenges by examining journalists’ discussion of the controversial handling of publicly accessible gun permit data in two communities in the United States. The cases are not new now, but the issues they raise persist and point to opportunities – both to learn from online discussion of ethical issues and to ask a wide range of ethical questions about data journalism

The cases

Less than two weeks after the 2012 shooting deaths of 20 children and six staff members at Sandy Hook Elementary School in Newtown, Connecticut, a journalist at The Journal News in White Plains, New York, wrote a story about the possible expansion of publicly accessible gun permit data. The article was accompanied by three online maps with the locations of gun permit holders. The clickable maps of a two-county area in the New York suburbs also included the names and addresses of the gun permit holders. The detailed maps with personal information prompted a public outcry both locally and nationally, mainly involving privacy and safety concerns, and were subsequently taken down.

Although the 2012 case prompted the greatest attention, another New York newspaper reporter’s Freedom of Information request for a gun permit database for three counties sparked an earlier public outcry in 2008. The Glen Falls Post-Star’s editor published an editorial in response. “We here at The Post-Star find ourselves in the unusual position of responding to the concerns of our readers about something that has not even been published in our newspaper or Web site,” the editorial began. The editor said the request “drew great concern from members of gun clubs and people with gun permits in general, a concern we totally understand.”

Both of these cases prompted discussion among journalists, including participants in NICAR-L, the listserv of the National Institute for Computer-Assisted Reporting, whose subscribers include data journalists from major news organizations in the United States and around the world. Our study examined the content of three discussion threads with a total of 119 posts that focused mainly on ethical issues.

Key ethical issues

Several broad ethical issues, and specific themes related to those issues, appeared in the discussion.

1. Freedom versus responsibility and journalistic purpose..

2. Privacy and verification..

3. Consequences..

….(More)”

See also: David Craig, Stan Ketterer and Mohammad Yousuf, “To Post or Not to Post: Online Discussion of Gun Permit Mapping and the Development of Ethical Standards in Data Journalism,” Journalism & Mass Communication Quarterly

Data Governance Regimes in the Digital Economy: The Example of Connected Cars


Paper by Wolfgang Kerber and Jonas Severin Frank: “The Internet of Things raises a number of so far unsolved legal and regulatory questions. Particularly important are the issues of privacy, data ownership, and data access. One particularly interesting example are connected cars with their huge amount of produced data. Also based upon the recent discussion about data ownership and data access in the context of the EU Communication “Building a European data economy” this paper has two objectives:

(1) It intends to provide a General economic theoretical framework for the analysis of data governance regimes for data in Internet of Things contexts, in which two levels of data governance are distinguished (private data governance based upon contracts and the legal and regulatory framework for markets). This framework focuses on potential market failures that can emerge in regard to data and privacy.

(2) It applies this analytical framework to the complex problem of data governance in connected cars (with its different stakeholders car manufacturers, car owners, car component suppliers, repair service providers, insurance companies, and other service providers), and identifies several potential market failure problems in regard to this specific data governance problem (esp. competition problems, information/behavioral Problems and privacy problems).

These results can be an important input for future research that focuses more on the specific policy implications for data governance in connected cars. Although the paper is primarily an economic paper, it tries to take into account important aspects of the legal discussion….(More)”.

Nobody reads privacy policies – here’s how to fix that


 at the Conversation: “…The key to turning privacy notices into something useful for consumers is to rethink their purpose. A company’s policy might show compliance with the regulations the firm is bound to follow, but remains impenetrable to a regular reader.

The starting point for developing consumer-friendly privacy notices is to make them relevant to the user’s activity, understandable and actionable. As part of the Usable Privacy Policy Project, my colleagues and I developed a way to make privacy notices more effective.

The first principle is to break up the documents into smaller chunks and deliver them at times that are appropriate for users. Right now, a single multi-page policy might have many sections and paragraphs, each relevant to different services and activities. Yet people who are just casually browsing a website need only a little bit of information about how the site handles their IP addresses, if what they look at is shared with advertisers and if they can opt out of interest-based ads. Those people doesn’t need to know about many other things listed in all-encompassing policies, like the rules associated with subscribing to the site’s email newsletter, nor how the site handles personal or financial information belonging to people who make purchases or donations on the site.

When a person does decide to sign up for email updates or pay for a service through the site, then an additional short privacy notice could tell her the additional information she needs to know. These shorter documents should also offer users meaningful choices about what they want a company to do – or not do – with their data. For instance, a new subscriber might be allowed to choose whether the company can share his email address or other contact information with outside marketing companies by clicking a check box.

Understanding users’ expectations

Notices can be made even simpler if they focus particularly on unexpected or surprising types of data collection or sharing. For instance, in another study, we learned that most people know their fitness tracker counts steps – so they didn’t really need a privacy notice to tell them that. But they did not expect their data to be collectedaggregated and shared with third parties. Customers should be asked for permission to do this, and allowed to restrict sharing or opt out entirely.

Most importantly, companies should test new privacy notices with users, to ensure final versions are understandable and not misleading, and that offered choices are meaningful….(More)”