Paper by Daniel Kondor, Behrooz Hashemian, Yves-Alexandre de Montjoye and Carlo Ratti: “The problem of unicity and reidentifiability of records in large-scale databases has been studied in different contexts and approaches, with focus on preserving privacy or matching records from different data sources. With an increasing number of service providers nowadays routinely collecting location traces of their users on unprecedented scales, there is a pronounced interest in the possibility of matching records and datasets based on spatial trajectories. Extending previous work on reidentifiability of spatial data and trajectory matching, we present the first large-scale analysis of user matchability in real mobility datasets on realistic scales, i.e. among two datasets that consist of several million people’s mobility traces, coming from a mobile network operator and transportation smart card usage. We extract the relevant statistical properties which influence the matching process and analyze their impact on the matchability of users. We show that for individuals with typical activity in the transportation system (those making 3-4 trips per day on average), a matching algorithm based on the co-occurrence of their activities is expected to achieve a 16.8% success only after a one-week long observation of their mobility traces, and over 55% after four weeks. We show that the main determinant of matchability is the expected number of co-occurring records in the two datasets. Finally, we discuss different scenarios in terms of data collection frequency and give estimates of matchability over time. We show that with higher frequency data collection becoming more common, we can expect much higher success rates in even shorter intervals….(More)”.
The Seductive Diversion of ‘Solving’ Bias in Artificial Intelligence
Blog by Julia Powles and Helen Nissenbaum: “Serious thinkers in academia and business have swarmed to the A.I. bias problem, eager to tweak and improve the data and algorithms that drive artificial intelligence. They’ve latched onto fairness as the objective, obsessing over competing constructs of the term that can be rendered in measurable, mathematical form. If the hunt for a science of computational fairness was restricted to engineers, it would be one thing. But given our contemporary exaltation and deference to technologists, it has limited the entire imagination of ethics, law, and the media as well.
There are three problems with this focus on A.I. bias. The first is that addressing bias as a computational problem obscures its root causes. Bias is a social problem, and seeking to solve it within the logic of automation is always going to be inadequate.
Second, even apparent success in tackling bias can have perverse consequences. Take the example of a facial recognition system that works poorly on women of color because of the group’s underrepresentation both in the training data and among system designers. Alleviating this problem by seeking to “equalize” representation merely co-opts designers in perfecting vast instruments of surveillance and classification.
When underlying systemic issues remain fundamentally untouched, the bias fighters simply render humans more machine readable, exposing minorities in particular to additional harms.
Third — and most dangerous and urgent of all — is the way in which the seductive controversy of A.I. bias, and the false allure of “solving” it, detracts from bigger, more pressing questions. Bias is real, but it’s also a captivating diversion.
What has been remarkably underappreciated is the key interdependence of the twin stories of A.I. inevitability and A.I. bias. Against the corporate projection of an otherwise sunny horizon of unstoppable A.I. integration, recognizing and acknowledging bias can be seen as a strategic concession — one that subdues the scale of the challenge. Bias, like job losses and safety hazards, becomes part of the grand bargain of innovation.
The reality that bias is primarily a social problem and cannot be fully solved technically becomes a strength, rather than a weakness, for the inevitability narrative. It flips the script. It absorbs and regularizes the classification practices and underlying systems of inequality perpetuated by automation, allowing relative increases in “fairness” to be claimed as victories — even if all that is being done is to slice, dice, and redistribute the makeup of those negatively affected by actuarial decision-making.
In short, the preoccupation with narrow computational puzzles distracts us from the far more important issue of the colossal asymmetry between societal cost and private gain in the rollout of automated systems. It also denies us the possibility of asking: Should we be building these systems at all?…(More)”.
To Reduce Privacy Risks, the Census Plans to Report Less Accurate Data
Mark Hansen at the New York Times: “When the Census Bureau gathered data in 2010, it made two promises. The form would be “quick and easy,” it said. And “your answers are protected by law.”
But mathematical breakthroughs, easy access to more powerful computing, and widespread availability of large and varied public data sets have made the bureau reconsider whether the protection it offers Americans is strong enough. To preserve confidentiality, the bureau’s directors have determined they need to adopt a “formal privacy” approach, one that adds uncertainty to census data before it is published and achieves privacy assurances that are provable mathematically.
The census has always added some uncertainty to its data, but a key innovation of this new framework, known as “differential privacy,” is a numerical value describing how much privacy loss a person will experience. It determines the amount of randomness — “noise” — that needs to be added to a data set before it is released, and sets up a balancing act between accuracy and privacy. Too much noise would mean the data would not be accurate enough to be useful — in redistricting, in enforcing the Voting Rights Act or in conducting academic research. But too little, and someone’s personal data could be revealed.
On Thursday, the bureau will announce the trade-off it has chosen for data publications from the 2018 End-to-End Census Test it conducted in Rhode Island, the only dress rehearsal before the actual census in 2020. The bureau has decided to enforce stronger privacy protections than companies like Apple or Google had when they each first took up differential privacy
In presentation materials for Thursday’s announcement, special attention is paid to lessening any problems with redistricting: the potential complications of using noisy counts of voting-age people to draw district lines. (By contrast, in 2000 and 2010 the swapping mechanism produced exact counts of potential voters down to the block level.)
The Census Bureau has been an early adopter of differential privacy. Still, instituting the framework on such a large scale is not an easy task, and even some of the big technology firms have had difficulties. For example, shortly after Apple’s announcement in 2016 that it would use differential privacy for data collected from its macOS and iOS operating systems, it was revealed that the actual privacy loss of their systems was much higher than advertised.
Some scholars question the bureau’s abandonment of techniques like swapping in favor of differential privacy. Steven Ruggles, Regents Professor of history and population studies at the University of Minnesota, has relied on census data for decades. Through the Integrated Public Use Microdata Series, he and his team have regularized census data dating to 1850, providing consistency between questionnaires as the forms have changed, and enabling researchers to analyze data across years.
“All of the sudden, Title 13 gets equated with differential privacy — it’s not,” he said, adding that if you make a guess about someone’s identity from looking at census data, you are probably wrong. “That has been regarded in the past as protection of privacy. They want to make it so that you can’t even guess.”
“There is a trade-off between usability and risk,” he added. “I am concerned they may go far too far on privileging an absolutist standard of risk.”
In a working paper published Friday, he said that with the number of private services offering personal data, a prospective hacker would have little incentive to turn to public data such as the census “in an attempt to uncover uncertain, imprecise and outdated information about a particular individual.”…(More)”.
Why We Need to Audit Algorithms
James Guszcza, Iyad Rahwan, Will Bible, Manuel Cebrian and Vic Katyal at Harvard Business Review: “Algorithmic decision-making and artificial intelligence (AI) hold enormous potential and are likely to be economic blockbusters, but we worry that the hype has led many people to overlook the serious problems of introducing algorithms into business and society. Indeed, we see many succumbing to what Microsoft’s Kate Crawford calls “data fundamentalism” — the notion that massive datasets are repositories that yield reliable and objective
Ensuring that societal values are reflected in algorithms and AI technologies will require no less creativity, hard work, and innovation than developing the AI technologies themselves. We have a proposal for a good place to start: auditing. Companies have long been required to issue audited financial statements for the benefit of financial markets and other stakeholders. That’s because — like algorithms — companies’ internal operations appear as “black boxes” to those on the outside. This gives managers an informational advantage over the investing public which could be abused by unethical actors. Requiring managers to report periodically on their operations provides a check on that advantage. To bolster the trustworthiness of these reports, independent auditors are hired to provide reasonable assurance that the reports coming from the “black box” are free of material misstatement. Should we not subject societally impactful “black box” algorithms to comparable scrutiny?
Indeed, some forward thinking regulators are beginning to explore this possibility. For example, the EU’s General Data Protection Regulation (GDPR) requires that organizations be able to explain their algorithmic decisions. The city of New York recently assembled a task force to study possible biases in algorithmic decision systems. It is reasonable to anticipate that emerging regulations might be met with market pull for services involving algorithmic accountability.
So what might an algorithm auditing discipline look like? First, it should adopt a holistic perspective. Computer science and machine learning methods will be necessary, but likely not sufficient foundations for an algorithm auditing discipline. Strategic thinking, contextually informed professional judgment, communication, and the scientific method are also required.
As a result, algorithm auditing must be interdisciplinary in order for it to succeed
Beijing to Judge Every Resident Based on Behavior by End of 2020
Bloomberg News: “China’s plan to judge each of its 1.3 billion people based on their social behavior is moving a step closer to reality, with Beijing set to adopt a lifelong points program by 2021 that assigns personalized ratings for each resident.
The capital city will pool data from several departments to reward and punish some 22 million citizens based on their actions and reputations by the end of 2020, according to a plan posted on the Beijing municipal government’s website on Monday. Those with better so-called social credit will get “green channel” benefits while those who violate laws will find life more difficult.
The Beijing project will improve blacklist systems so that those deemed untrustworthy will be “unable to move even a single step,” according to the government’s plan. Xinhua reported on the proposal Tuesday, while the report posted on the municipal government’s website is dated July 18.
China has long experimented with systems that grade its citizens, rewarding good behavior with streamlined services while punishing bad actions with restrictions and penalties. Critics say such moves are fraught with risks and could lead to systems that reduce humans to little more than a report card.
Ambitious Plan
Beijing’s efforts represent the most ambitious yet among more than a dozen cities that are moving ahead with similar programs.
Hangzhou rolled out its personal credit system earlier this year, rewarding “pro-social behaviors” such as volunteer work and blood donations while punishing those who violate traffic laws and charge under-the-table fees. By the end of May, people with bad credit in China have been blocked from booking more than 11 million flights and 4 million high-speed train trips, according to the National Development and Reform Commission.
According to the Beijing government’s plan, different agencies will link databases to get a more detailed picture of every resident’s interactions across a swathe of services
Explaining Explanations in AI
Paper by Brent Mittelstadt Chris Russell and Sandra Wachter: “Recent work on interpretability in machine learning and AI has focused on the building of simplified models that approximate the true criteria used to make decisions. These models are a useful pedagogical device for teaching trained professionals how to predict what decisions will be made by the complex system, and most importantly how the system might break. However, when considering any such model it’s important to remember Box’s maxim that “All models are wrong but some are useful.”
We focus on the distinction between these models and explanations in philosophy and sociology. These models can be understood as a “do it yourself kit” for explanations, allowing a practitioner to directly answer “what if questions” or generate contrastive explanations without external assistance. Although a valuable ability, giving these models as explanations
Big Data Ethics and Politics: Toward New Understandings
As big data touch on many realms of daily life and have profound impacts
Startup Offers To Sequence Your Genome Free Of Charge, Then Let You Profit From It
Richard Harris at NPR: “A startup genetics company says it’s now offering to sequence your entire genome at no cost to you. In fact, you would own the data and may even be able to make money off it.
Nebula Genomics, created by the prominent Harvard geneticist George Church and his lab colleagues, seeks to upend the usual way genomic information is owned.
Today, companies like 23andMe make some of their money by scanning your genetic patterns and then selling that information to drug companies for use in research. (You choose whether to opt in.)
Church says his new enterprise leaves ownership and control of the data in an individual’s hands. And the genomic analysis Nebula will perform is much more detailed than what 23andMe and similar companies offer.
Nebula will do a full genome sequence, rather than a snapshot of key gene variants. That wider range of genetic information would
Church’s approach is part of a trend that’s pushing back against the multibillion-dollar industry to buy and sell medical information. Right now, companies reap those profits and control the data.
“Patients should have the right to decide for themselves whether they want to share their medical data, and, if so, with whom,” Adam Tanner, at Harvard’s Institute for Quantitative Social Science, says in an email. “Efforts to empower people to fine-tune the fate of their medical information are a step in the right direction.” Tanner, author of a book on the subject of the trade in medical data, isn’t involved in Nebula.
The current system is “very paternalistic,” Church says. He aims to give people complete control over who gets access to their data, and let individuals decide whether or not to sell the information, and to whom.
“In this case, everything is private information, stored on your computer or a computer you designate,” Church says. It can be encrypted so nobody can read it, even you, if that’s what you want.
Drug companies interested in studying, say, diabetes patients would ask Nebula to identify people in their system who have the disease. Nebula would then identify those individuals by launching an encrypted search of participants.
People who have indicated they’re interested in selling their genetic data to a company would then be given the option of granting access to the information, along with medical data that person has designated.
Other companies are also springing up to help people control — and potentially profit from — their medical data. EncrypGen lets people offer up their genetic data, though customers have to provide their own DNA sequence.
What do we learn from Machine Learning?
Blog by Giovanni Buttarelli: “…There are few authorities monitoring the impact of new technologies on fundamental rights so closely and intensively as data protection and privacy commissioners. At the International Conference of Data Protection and Privacy Commissioners, the 40th ICDPPC (which the EDPS had the
The ICDPPC was also chosen by an alliance of NGOs and individuals, The Public Voice, as the moment to launch its own Universal Guidelines on Artificial Intelligence (UGAI). The twelve principles laid down in these guidelines extend and complement those of the ICDPPC declaration.
We are only at the beginning of this debate. More voices will be heard: think tanks such as CIPL are coming forward with their suggestions, and so will many other organisations.
At international level, the Council of Europe has invested efforts in assessing the impact of AI, and has announced a report and guidelines to be published soon. The European Commission has appointed an expert group which will, among other tasks, give recommendations on future-related policy development and on ethical, legal and societal issues related to AI, including socio-economic challenges.
As I already pointed out in an earlier
Sidewalk Labs: Privacy in a City Built from the Internet Up
Harvard Business School Case Study by Leslie K. John, Mitchell Weiss and Julia Kelley: “By the time Dan Doctoroff, CEO of Sidewalk Labs, began hosting a Reddit “Ask Me Anything” session in January 2018, he had only nine months remaining to convince the people of Toronto, their government representatives, and presumably his parent company Alphabet, Inc., that Sidewalk Labs’ plan to construct “the first truly 21st-century city” on the Canadian city’s waterfront was a sound one. Along with much excitement and optimism, strains of concern had emerged since Doctoroff and partners first announced their intentions for a city “built from the internet up” in Toronto’s Quayside district. As Doctoroff prepared for yet another milestone in a year of planning and community engagement, it was almost certain that of the many questions headed his way, digital privacy would be among them….(More)”.