The World’s Biggest Biometric Database Keeps Leaking People’s Data


Rohith Jyothish at FastCompany: “India’s national scheme holds the personal data of more than 1.13 billion citizens and residents of India within a unique ID system branded as Aadhaar, which means “foundation” in Hindi. But as more and more evidence reveals that the government is not keeping this information private, the actual foundation of the system appears shaky at best.

On January 4, 2018, The Tribune of India, a news outlet based out of Chandigarh, created a firestorm when it reported that people were selling access to Aadhaar data on WhatsApp, for alarmingly low prices….

The Aadhaar unique identification number ties together several pieces of a person’s demographic and biometric information, including their photograph, fingerprints, home address, and other personal information. This information is all stored in a centralized database, which is then made accessible to a long list of government agencies who can access that information in administrating public services.

Although centralizing this information could increase efficiency, it also creates a highly vulnerable situation in which one simple breach could result in millions of India’s residents’ data becoming exposed.

The Annual Report 2015-16 of the Ministry of Electronics and Information Technology speaks of a facility called DBT Seeding Data Viewer (DSDV) that “permits the departments/agencies to view the demographic details of Aadhaar holder.”

According to @databaazi, DSDV logins allowed third parties to access Aadhaar data (without UID holder’s consent) from a white-listed IP address. This meant that anyone with the right IP address could access the system.

This design flaw puts personal details of millions of Aadhaar holders at risk of broad exposure, in clear violation of the Aadhaar Act.…(More)”.

The Future Computed: Artificial Intelligence and its role in society


Brad Smith at the Microsoft Blog: “Today Microsoft is releasing a new book, The Future Computed: Artificial Intelligence and its role in society. The two of us have written the foreword for the book, and our teams collaborated to write its contents. As the title suggests, the book provides our perspective on where AI technology is going and the new societal issues it has raised.

On a personal level, our work on the foreword provided an opportunity to step back and think about how much technology has changed our lives over the past two decades and to consider the changes that are likely to come over the next 20 years. In 1998, we both worked at Microsoft, but on opposite sides of the globe. While we lived on separate continents and in quite different cultures, we shared similar experiences and daily routines which were managed by manual planning and movement. Twenty years later, we take for granted the digital world that was once the stuff of science fiction.

Technology – including mobile devices and cloud computing – has fundamentally changed the way we consume news, plan our day, communicate, shop and interact with our family, friends and colleagues. Two decades from now, what will our world look like? At Microsoft, we imagine that artificial intelligence will help us do more with one of our most precious commodities: time. By 2038, personal digital assistants will be trained to anticipate our needs, help manage our schedule, prepare us for meetings, assist as we plan our social lives, reply to and route communications, and drive cars.

Beyond our personal lives, AI will enable breakthrough advances in areas like healthcare, agriculture, education and transportation. It’s already happening in impressive ways.

But as we’ve witnessed over the past 20 years, new technology also inevitably raises complex questions and broad societal concerns. As we look to a future powered by a partnership between computers and humans, it’s important that we address these challenges head on.

How do we ensure that AI is designed and used responsibly? How do we establish ethical principles to protect people? How should we govern its use? And how will AI impact employment and jobs?

To answer these tough questions, technologists will need to work closely with government, academia, business, civil society and other stakeholders. At Microsoft, we’ve identified six ethical principles – fairness, reliability and safety, privacy and security, inclusivity, transparency, and accountability – to guide the cross-disciplinary development and use of artificial intelligence. The better we understand these or similar issues — and the more technology developers and users can share best practices to address them — the better served the world will be as we contemplate societal rules to govern AI.

We must also pay attention to AI’s impact on workers. What jobs will AI eliminate? What jobs will it create? If there has been one constant over 250 years of technological change, it has been the ongoing impact of technology on jobs — the creation of new jobs, the elimination of existing jobs and the evolution of job tasks and content. This too is certain to continue.

Some key conclusions are emerging….

The Future Computed is available here and additional content related to the book can be found here.”

Big Data and medicine: a big deal?


V. Mayer-Schönberger and E. Ingelsson in the Journal of Internal Medicine: “Big Data promises huge benefits for medical research. Looking beyond superficial increases in the amount of data collected, we identify three key areas where Big Data differs from conventional analyses of data samples: (i) data are captured more comprehensively relative to the phenomenon under study; this reduces some bias but surfaces important trade-offs, such as between data quantity and data quality; (ii) data are often analysed using machine learning tools, such as neural networks rather than conventional statistical methods resulting in systems that over time capture insights implicit in data, but remain black boxes, rarely revealing causal connections; and (iii) the purpose of the analyses of data is no longer simply answering existing questions, but hinting at novel ones and generating promising new hypotheses. As a consequence, when performed right, Big Data analyses can accelerate research.

Because Big Data approaches differ so fundamentally from small data ones, research structures, processes and mindsets need to adjust. The latent value of data is being reaped through repeated reuse of data, which runs counter to existing practices not only regarding data privacy, but data management more generally. Consequently, we suggest a number of adjustments such as boards reviewing responsible data use, and incentives to facilitate comprehensive data sharing. As data’s role changes to a resource of insight, we also need to acknowledge the importance of collecting and making data available as a crucial part of our research endeavours, and reassess our formal processes from career advancement to treatment approval….(More)”.

Data-Intensive Approaches To Creating Innovation For Sustainable Smart Cities


Science Trends: “Located at the complex intersection of economic development and environmental change, cities play a central role in our efforts to move towards sustainability. Reducing air and water pollution, improving energy efficiency while securing energy supply, and minimizing vulnerabilities to disruptions and disturbances are interconnected and pose a formidable challenge, with their dynamic interactions changing in highly complex and unpredictable manners….

The Beijing City Lab demonstrates the usefulness of open urban data in mapping urbanization with a fine spatiotemporal scale and reflecting social and environmental dimensions of urbanization through visualization at multiple scales.

The basic principle of open data will generate significant opportunities for promoting inter-disciplinary and inter-organizational research, producing new data sets through the integration of different sources, avoiding duplication of research, facilitating the verification of previous results, and encouraging citizen scientists and crowdsourcing approaches. Open data also is expected to help governments promote transparency, citizen participation, and access to information in policy-making processes.

Despite a significant potential, however, there still remain numerous challenges in facilitating innovation for urban sustainability through open data. The scope and amount of data collected and shared are still limited, and the quality control, error monitoring, and cleaning of open data is also indispensable in securing the reliability of the analysis. Also, the organizational and legal frameworks of data sharing platforms are often not well-defined or established, and it is critical to address the interoperability between various data standards, balance between open and proprietary data, and normative and legal issues such as the data ownership, personal privacy, confidentiality, law enforcement, and the maintenance of public safety and national security….

These findings are described in the article entitled Facilitating data-intensive approaches to innovation for sustainability: opportunities and challenges in building smart cities, published in the journal Sustainability Science. This work was led by Masaru Yarime from the City University of Hong Kong….(More)”.

Government data: How open is too open?


Sharon Fisher at HPE: “The notion of “open government” appeals to both citizens and IT professionals seeking access to freely available government data. But is there such a thing as data access being too open? Governments may want to be transparent, yet they need to avoid releasing personally identifiable information.

There’s no question that open government data offers many benefits. It gives citizens access to the data their taxes paid for, enables government oversight, and powers the applications developed by government, vendors, and citizens that improve people’s lives.

However, data breaches and concerns about the amount of data that government is collecting makes some people wonder: When is it too much?

“As we think through the big questions about what kind of data a state should collect, how it should use it, and how to disclose it, these nuances become not some theoretical issue but a matter of life and death to some people,” says Alexander Howard, deputy director of the Sunlight Foundation, a Washington nonprofit that advocates for open government. “There are people in government databases where the disclosure of their [physical] location is the difference between a life-changing day and Wednesday.

Open data supporters point out that much of this data has been considered a public record all along and tout the value of its use in analytics. But having personal data aggregated in a single place that is accessible online—as opposed to, say, having to go to an office and physically look up each record—makes some people uneasy.

Privacy breaches, wholesale

“We’ve seen a real change in how people perceive privacy,” says Michael Morisy, executive director at MuckRock, a Cambridge, Massachusetts, nonprofit that helps media and citizens file public records requests. “It’s been driven by a long-standing concept in transparency: practical obscurity.” Even if something was technically a public record, effort needed to be expended to get one’s hands on it. That amount of work might be worth it about, say, someone running for office, but on the whole, private citizens didn’t have to worry. Things are different now, says Morisy. “With Google, and so much data being available at the click of a mouse or the tap of a phone, what was once practically obscure is now instantly available.”

People are sometimes also surprised to find out that public records can contain their personally identifiable information (PII), such as addresses, phone numbers, and even Social Security numbers. That may be on purpose or because someone failed to redact the data properly.

That’s had consequences. Over the years, there have been a number of incidents in which PII from public records, including addresses, was used to harass and sometimes even kill people. For example, in 1989, Rebecca Schaeffer was murdered by a stalker who learned her address from the Department of Motor Vehicles. Other examples of harassment via driver’s license numbers include thieves who tracked down the address of owners of expensive cars and activists who sent anti-abortion literature to women who had visited health clinics that performed abortions.

In response, in 1994, Congress enacted the Driver’s Privacy Protection Act to restrict the sale of such data. More recently, the state of Idaho passed a law protecting the identity of hunters who shot wolves, because the hunters were being harassed by wolf supporters. Similarly, the state of New York allowed concealed pistol permit holders to make their name and address private after a newspaper published an online interactive map showing the names and addresses of all handgun permit holders in Westchester and Rockland counties….(More)”.

New York City moves to create accountability for algorithms


Lauren Kirchner at ArsTechnica: “The algorithms that play increasingly central roles in our lives often emanate from Silicon Valley, but the effort to hold them accountable may have another epicenter: New York City. Last week, the New York City Council unanimously passed a bill to tackle algorithmic discrimination—the first measure of its kind in the country.

The algorithmic accountability bill, waiting to be signed into law by Mayor Bill de Blasio, establishes a task force that will study how city agencies use algorithms to make decisions that affect New Yorkers’ lives, and whether any of the systems appear to discriminate against people based on age, race, religion, gender, sexual orientation, or citizenship status. The task force’s report will also explore how to make these decision-making processes understandable to the public.

The bill’s sponsor, Council Member James Vacca, said he was inspired by ProPublica’s investigation into racially biased algorithms used to assess the criminal risk of defendants….

A previous, more sweeping version of the bill had mandated that city agencies publish the source code of all algorithms being used for “targeting services” or “imposing penalties upon persons or policing” and to make them available for “self-testing” by the public. At a hearing at City Hall in October, representatives from the mayor’s office expressed concerns that this mandate would threaten New Yorkers’ privacy and the government’s cybersecurity.

The bill was one of two moves the City Council made last week concerning algorithms. On Thursday, the committees on health and public safety held a hearing on the city’s forensic methods, including controversial tools that the chief medical examiner’s office crime lab has used for difficult-to-analyze samples of DNA.

As a ProPublica/New York Times investigation detailed in September, an algorithm created by the lab for complex DNA samples has been called into question by scientific experts and former crime lab employees.

The software, called the Forensic Statistical Tool, or FST, has never been adopted by any other lab in the country….(More)”.

Normative Challenges of Identification in the Internet of Things: Privacy, Profiling, Discrimination, and the GDPR


Paper by Sandra Wachter: “In the Internet of Things (IoT), identification and access control technologies provide essential infrastructure to link data between a user’s devices with unique identities, and provide seamless and linked up services. At the same time, profiling methods based on linked records can reveal unexpected details about users’ identity and private life, which can conflict with privacy rights and lead to economic, social, and other forms of discriminatory treatment. A balance must be struck between identification and access control required for the IoT to function and user rights to privacy and identity. Striking this balance is not an easy task because of weaknesses in cybersecurity and anonymisation techniques.

The EU General Data Protection Regulation (GDPR), set to come into force in May 2018, may provide essential guidance to achieve a fair balance between the interests of IoT providers and users. Through a review of academic and policy literature, this paper maps the inherit tension between privacy and identifiability in the IoT.

It focuses on four challenges: (1) profiling, inference, and discrimination; (2) control and context-sensitive sharing of identity; (3) consent and uncertainty; and (4) honesty, trust, and transparency. The paper will then examine the extent to which several standards defined in the GDPR will provide meaningful protection for privacy and control over identity for users of IoT. The paper concludes that in order to minimise the privacy impact of the conflicts between data protection principles and identification in the IoT, GDPR standards urgently require further specification and implementation into the design and deployment of IoT technologies….(More)”.

Research reveals de-identified patient data can be re-identified


Vanessa Teague, Chris Culnane and Ben Rubinstein in PhysOrg: “In August 2016, Australia’s federal Department of Health published medical billing records of about 2.9 million Australians online. These records came from the Medicare Benefits Scheme (MBS) and the Pharmaceutical Benefits Scheme (PBS) containing 1 billion lines of historical health data from the records of around 10 per cent of the population.

These longitudinal records were de-identified, a process intended to prevent a person’s identity from being connected with information, and were made public on the government’s open data website as part of its policy on accessible public 

We found that patients can be re-identified, without decryption, through a process of linking the unencrypted parts of the  with known information about the individual.

Our findings replicate those of similar studies of other de-identified datasets:

  • A few mundane facts taken together often suffice to isolate an individual.
  • Some patients can be identified by name from publicly available information.
  • Decreasing the precision of the data, or perturbing it statistically, makes re-identification gradually harder at a substantial cost to utility.

The first step is examining a patient’s uniqueness according to medical procedures such as childbirth. Some individuals are unique given public information, and many patients are unique given a few basic facts, such as year of birth or the date a baby was delivered….

The second step is examining uniqueness according to the characteristics of commercial datasets we know of but cannot access directly. There are high uniqueness rates that would allow linking with a commercial pharmaceutical dataset, and with the billing data available to a bank. This means that ordinary people, not just the prominent ones, may be easily re-identifiable by their bank or insurance company…

These de-identification methods were bound to fail, because they were trying to achieve two inconsistent aims: the protection of individual privacy and publication of detailed individual records. De-identification is very unlikely to work for other rich datasets in the government’s care, like census data, tax records, mental health records, penal information and Centrelink data.

While the ambition of making more data more easily available to facilitate research, innovation and sound public policy is a good one, there is an important technical and procedural problem to solve: there is no good solution for publishing sensitive complex individual records that protects privacy without substantially degrading the usefulness of the data.

Some data can be safely published online, such as information about government, aggregations of large collections of material, or data that is differentially private. For sensitive, complex data about individuals, a much more controlled release in a secure research environment is a better solution. The Productivity Commission recommends a “trusted user” model, and techniques like dynamic consent also give patients greater control and visibility over their personal information….(More).

Accelerating the Sharing of Data Across Sectors to Advance the Common Good


Paper by Robert M. Groves and Adam Neufeld: “The public pays for and provides an incredible amount of data to governments and companies. Yet much of the value of this data is being wasted, remaining in silos rather than being shared to enhance the common good—whether it’s helping governments to stop opioid addiction or helping companies predict and meet the demand for electric or autonomous vehicles.

  • Many companies and governments are interested in sharing more of their data with each other; however, right now the process of sharing is very time consuming and can pose great risks since it often involves sharing full data sets with another entity
  • We need intermediaries to design safe environments to facilitate data sharing in the low-trust and politically sensitive context of companies and governments. These safe environments would exist outside the government, be transparent to the public, and use modern technologies and techniques to allow only statistical uses of data through temporary linkages in order to minimize the risk to individuals’ privacy.
  • Governments must lead the way in sharing more data by re-evaluating laws that limit sharing of data, and must embrace new technologies that could allow the private sector to receive at least some value from many sensitive data sets. By decreasing the cost and risks of sharing data, more data will be freed from their silos, and we will move closer to what we deserve—that our data are used for the greatest societal benefit….(More)”.

Sharing is Daring: An Experiment on Consent, Chilling Effects and a Salient Privacy Nudge


Hermstrüwer, Yoan and Dickert, Stephan at the International Review of Law and Economics: “Privacy law rests on the assumption that government surveillance may increase the general level of conformity and thus generate a chilling effect. In a study that combines elements of a lab and a field experiment, we show that salient and incentivized consent options are sufficient to trigger this behavioral effect. Salient ex ante consent options may lure people into giving up their privacy and increase their compliance with social norms – even when the only immediate risk of sharing information is mere publicity on a Google website. A right to be forgotten (right to deletion), however, seems to reduce neither privacy valuations nor chilling effects. In spite of low deletion costs people tend to stick with a retention default. The study suggests that consent architectures may play out on social conformity rather than on consent choices and privacy valuations. Salient notice and consent options may not merely empower users to make an informed consent decision. Instead, they can trigger the very effects that privacy law intends to curb….(More)”.