On the privacy-conscientious use of mobile phone data


Yves-Alexandre de Montjoye et al in Nature: “The breadcrumbs we leave behind when using our mobile phones—who somebody calls, for how long, and from where—contain unprecedented insights about us and our societies. Researchers have compared the recent availability of large-scale behavioral datasets, such as the ones generated by mobile phones, to the invention of the microscope, giving rise to the new field of computational social science.

With mobile phone penetration rates reaching 90% and under-resourced national statistical agencies, the data generated by our phones—traditional Call Detail Records (CDR) but also high-frequency x-Detail Record (xDR)—have the potential to become a primary data source to tackle crucial humanitarian questions in low- and middle-income countries. For instance, they have already been used to monitor population displacement after disasters, to provide real-time traffic information, and to improve our understanding of the dynamics of infectious diseases. These data are also used by governmental and industry practitioners in high-income countries.

While there is little doubt on the potential of mobile phone data for good, these data contain intimate details of our lives: rich information about our whereabouts, social life, preferences, and potentially even finances. A BCG study showed, e.g., that 60% of Americans consider location data and phone number history—both available in mobile phone data—as “private”.

Historically and legally, the balance between the societal value of statistical data (in aggregate) and the protection of privacy of individuals has been achieved through data anonymization. While hundreds of different anonymization algorithms exist, most of them are variations and improvements of the seminal k-anonymity algorithm introduced in 1998. Recent studies have, however, shown that pseudonymization and standard de-identification are not sufficient to prevent users from being re-identified in mobile phone data. Four data points—approximate places and times where an individual was present—have been shown to be enough to uniquely re-identify them 95% of the time in a mobile phone dataset of 1.5 million people. Furthermore, re-identification estimations using unicity—a metric to evaluate the risk of re-identification in large-scale datasets—and attempts at k-anonymizing mobile phone data ruled out de-identification as sufficient to truly anonymize the data. This was echoed in the recent report of the [US] President’s Council of Advisors on Science and Technology on Big Data Privacy which consider de-identification to be useful as an “added safeguard, but [emphasized that] it is not robust against near-term future re-identification methods”.

The limits of the historical de-identification framework to adequately balance risks and benefits in the use of mobile phone data are a major hindrance to their use by researchers, development practitioners, humanitarian workers, and companies. This became particularly clear at the height of the Ebola crisis, when qualified researchers (including some of us) were prevented from accessing relevant mobile phone data on time despite efforts by mobile phone operators, the GSMA, and UN agencies, with privacy being cited as one of the main concerns.

These privacy concerns are, in our opinion, due to the failures of the traditional de-identification model and the lack of a modern and agreed upon framework for the privacy-conscientious use of mobile phone data by third-parties especially in the context of the EU General Data Protection Regulation (GDPR). Such frameworks have been developed for the anonymous use of other sensitive data such as census, household survey, and tax data. The positive societal impact of making these data accessible and the technical means available to protect people’s identity have been considered and a trade-off, albeit far from perfect, has been agreed on and implemented. This has allowed the data to be used in aggregate for the benefit of society. Such thinking and an agreed upon set of models has been missing so far for mobile phone data. This has left data protection authorities, mobile phone operators, and data users with little guidance on technically sound yet reasonable models for the privacy-conscientious use of mobile phone data. This has often resulted in suboptimal tradeoffs if any.

In this paper, we propose four models for the privacy-conscientious use of mobile phone data (Fig. 1). All of these models 1) focus on a use of mobile phone data in which only statistical, aggregate information is ultimately needed by a third-party and, while this needs to be confirmed on a per-country basis, 2) are designed to fall under the legal umbrella of “anonymous use of the data”. Examples of cases in which only statistical aggregated information is ultimately needed by the third-party are discussed below. They would include, e.g., disaster management, mobility analysis, or the training of AI algorithms in which only aggregate information on people’s mobility is ultimately needed by agencies, and exclude cases in which individual-level identifiable information is needed such as targeted advertising or loans based on behavioral data.

Figure 1
Figure 1: Matrix of the four models for the privacy-conscientious use of mobile phone data.

First, it is important to insist that none of these models is a silver bullet…(More)”.

Towards matching user mobility traces in large-scale datasets


Paper by Daniel Kondor, Behrooz Hashemian,  Yves-Alexandre de Montjoye and Carlo Ratti: “The problem of unicity and reidentifiability of records in large-scale databases has been studied in different contexts and approaches, with focus on preserving privacy or matching records from different data sources. With an increasing number of service providers nowadays routinely collecting location traces of their users on unprecedented scales, there is a pronounced interest in the possibility of matching records and datasets based on spatial trajectories. Extending previous work on reidentifiability of spatial data and trajectory matching, we present the first large-scale analysis of user matchability in real mobility datasets on realistic scales, i.e. among two datasets that consist of several million people’s mobility traces, coming from a mobile network operator and transportation smart card usage. We extract the relevant statistical properties which influence the matching process and analyze their impact on the matchability of users. We show that for individuals with typical activity in the transportation system (those making 3-4 trips per day on average), a matching algorithm based on the co-occurrence of their activities is expected to achieve a 16.8% success only after a one-week long observation of their mobility traces, and over 55% after four weeks. We show that the main determinant of matchability is the expected number of co-occurring records in the two datasets. Finally, we discuss different scenarios in terms of data collection frequency and give estimates of matchability over time. We show that with higher frequency data collection becoming more common, we can expect much higher success rates in even shorter intervals….(More)”.

The Seductive Diversion of ‘Solving’ Bias in Artificial Intelligence


Blog by Julia Powles and Helen Nissenbaum: “Serious thinkers in academia and business have swarmed to the A.I. bias problem, eager to tweak and improve the data and algorithms that drive artificial intelligence. They’ve latched onto fairness as the objective, obsessing over competing constructs of the term that can be rendered in measurable, mathematical form. If the hunt for a science of computational fairness was restricted to engineers, it would be one thing. But given our contemporary exaltation and deference to technologists, it has limited the entire imagination of ethics, law, and the media as well.

There are three problems with this focus on A.I. bias. The first is that addressing bias as a computational problem obscures its root causes. Bias is a social problem, and seeking to solve it within the logic of automation is always going to be inadequate.

Second, even apparent success in tackling bias can have perverse consequences. Take the example of a facial recognition system that works poorly on women of color because of the group’s underrepresentation both in the training data and among system designers. Alleviating this problem by seeking to “equalize” representation merely co-opts designers in perfecting vast instruments of surveillance and classification.

When underlying systemic issues remain fundamentally untouched, the bias fighters simply render humans more machine readable, exposing minorities in particular to additional harms.

Third — and most dangerous and urgent of all — is the way in which the seductive controversy of A.I. bias, and the false allure of “solving” it, detracts from bigger, more pressing questions. Bias is real, but it’s also a captivating diversion.

What has been remarkably underappreciated is the key interdependence of the twin stories of A.I. inevitability and A.I. bias. Against the corporate projection of an otherwise sunny horizon of unstoppable A.I. integration, recognizing and acknowledging bias can be seen as a strategic concession — one that subdues the scale of the challenge. Bias, like job losses and safety hazards, becomes part of the grand bargain of innovation.

The reality that bias is primarily a social problem and cannot be fully solved technically becomes a strength, rather than a weakness, for the inevitability narrative. It flips the script. It absorbs and regularizes the classification practices and underlying systems of inequality perpetuated by automation, allowing relative increases in “fairness” to be claimed as victories — even if all that is being done is to slice, dice, and redistribute the makeup of those negatively affected by actuarial decision-making.

In short, the preoccupation with narrow computational puzzles distracts us from the far more important issue of the colossal asymmetry between societal cost and private gain in the rollout of automated systems. It also denies us the possibility of asking: Should we be building these systems at all?…(More)”.

To Reduce Privacy Risks, the Census Plans to Report Less Accurate Data


Mark Hansen at the New York Times: “When the Census Bureau gathered data in 2010, it made two promises. The form would be “quick and easy,” it said. And “your answers are protected by law.”

But mathematical breakthroughs, easy access to more powerful computing, and widespread availability of large and varied public data sets have made the bureau reconsider whether the protection it offers Americans is strong enough. To preserve confidentiality, the bureau’s directors have determined they need to adopt a “formal privacy” approach, one that adds uncertainty to census data before it is published and achieves privacy assurances that are provable mathematically.

The census has always added some uncertainty to its data, but a key innovation of this new framework, known as “differential privacy,” is a numerical value describing how much privacy loss a person will experience. It determines the amount of randomness — “noise” — that needs to be added to a data set before it is released, and sets up a balancing act between accuracy and privacy. Too much noise would mean the data would not be accurate enough to be useful — in redistricting, in enforcing the Voting Rights Act or in conducting academic research. But too little, and someone’s personal data could be revealed.

On Thursday, the bureau will announce the trade-off it has chosen for data publications from the 2018 End-to-End Census Test it conducted in Rhode Island, the only dress rehearsal before the actual census in 2020. The bureau has decided to enforce stronger privacy protections than companies like Apple or Google had when they each first took up differential privacy….

In presentation materials for Thursday’s announcement, special attention is paid to lessening any problems with redistricting: the potential complications of using noisy counts of voting-age people to draw district lines. (By contrast, in 2000 and 2010 the swapping mechanism produced exact counts of potential voters down to the block level.)

The Census Bureau has been an early adopter of differential privacy. Still, instituting the framework on such a large scale is not an easy task, and even some of the big technology firms have had difficulties. For example, shortly after Apple’s announcement in 2016 that it would use differential privacy for data collected from its macOS and iOS operating systems, it was revealed that the actual privacy loss of their systems was much higher than advertised.

Some scholars question the bureau’s abandonment of techniques like swapping in favor of differential privacy. Steven Ruggles, Regents Professor of history and population studies at the University of Minnesota, has relied on census data for decades. Through the Integrated Public Use Microdata Series, he and his team have regularized census data dating to 1850, providing consistency between questionnaires as the forms have changed, and enabling researchers to analyze data across years.

“All of the sudden, Title 13 gets equated with differential privacy — it’s not,” he said, adding that if you make a guess about someone’s identity from looking at census data, you are probably wrong. “That has been regarded in the past as protection of privacy. They want to make it so that you can’t even guess.”

“There is a trade-off between usability and risk,” he added. “I am concerned they may go far too far on privileging an absolutist standard of risk.”

In a working paper published Friday, he said that with the number of private services offering personal data, a prospective hacker would have little incentive to turn to public data such as the census “in an attempt to uncover uncertain, imprecise and outdated information about a particular individual.”…(More)”.

Why We Need to Audit Algorithms


James Guszcza, Iyad Rahwan, Will Bible, Manuel Cebrian and Vic Katyal at Harvard Business Review: “Algorithmic decision-making and artificial intelligence (AI) hold enormous potential and are likely to be economic blockbusters, but we worry that the hype has led many people to overlook the serious problems of introducing algorithms into business and society. Indeed, we see many succumbing to what Microsoft’s Kate Crawford calls “data fundamentalism” — the notion that massive datasets are repositories that yield reliable and objective truths, if only we can extract them using machine learning tools. A more nuanced view is needed. It is by now abundantly clear that, left unchecked, AI algorithms embedded in digital and social technologies can encode societal biasesaccelerate the spread of rumors and disinformation, amplify echo chambers of public opinion, hijack our attention, and even impair our mental wellbeing.

Ensuring that societal values are reflected in algorithms and AI technologies will require no less creativity, hard work, and innovation than developing the AI technologies themselves. We have a proposal for a good place to start: auditing. Companies have long been required to issue audited financial statements for the benefit of financial markets and other stakeholders. That’s because — like algorithms — companies’ internal operations appear as “black boxes” to those on the outside. This gives managers an informational advantage over the investing public which could be abused by unethical actors. Requiring managers to report periodically on their operations provides a check on that advantage. To bolster the trustworthiness of these reports, independent auditors are hired to provide reasonable assurance that the reports coming from the “black box” are free of material misstatement. Should we not subject societally impactful “black box” algorithms to comparable scrutiny?

Indeed, some forward thinking regulators are beginning to explore this possibility. For example, the EU’s General Data Protection Regulation (GDPR) requires that organizations be able to explain their algorithmic decisions. The city of New York recently assembled a task force to study possible biases in algorithmic decision systems. It is reasonable to anticipate that emerging regulations might be met with market pull for services involving algorithmic accountability.

So what might an algorithm auditing discipline look like? First, it should adopt a holistic perspective. Computer science and machine learning methods will be necessary, but likely not sufficient foundations for an algorithm auditing discipline. Strategic thinking, contextually informed professional judgment, communication, and the scientific method are also required.

As a result, algorithm auditing must be interdisciplinary in order for it to succeed….(More)”.

Beijing to Judge Every Resident Based on Behavior by End of 2020


Bloomberg News: “China’s plan to judge each of its 1.3 billion people based on their social behavior is moving a step closer to reality, with Beijing set to adopt a lifelong points program by 2021 that assigns personalized ratings for each resident.

The capital city will pool data from several departments to reward and punish some 22 million citizens based on their actions and reputations by the end of 2020, according to a plan posted on the Beijing municipal government’s website on Monday. Those with better so-called social credit will get “green channel” benefits while those who violate laws will find life more difficult.

The Beijing project will improve blacklist systems so that those deemed untrustworthy will be “unable to move even a single step,” according to the government’s plan. Xinhua reported on the proposal Tuesday, while the report posted on the municipal government’s website is dated July 18.

China has long experimented with systems that grade its citizens, rewarding good behavior with streamlined services while punishing bad actions with restrictions and penalties. Critics say such moves are fraught with risks and could lead to systems that reduce humans to little more than a report card.

Ambitious Plan

Beijing’s efforts represent the most ambitious yet among more than a dozen cities that are moving ahead with similar programs.

Hangzhou rolled out its personal credit system earlier this year, rewarding “pro-social behaviors” such as volunteer work and blood donations while punishing those who violate traffic laws and charge under-the-table fees. By the end of May, people with bad credit in China have been blocked from booking more than 11 million flights and 4 million high-speed train trips, according to the National Development and Reform Commission.

According to the Beijing government’s plan, different agencies will link databases to get a more detailed picture of every resident’s interactions across a swathe of services….(More)”.

Explaining Explanations in AI


Paper by Brent Mittelstadt Chris Russell and Sandra Wachter: “Recent work on interpretability in machine learning and AI has focused on the building of simplified models that approximate the true criteria used to make decisions. These models are a useful pedagogical device for teaching trained professionals how to predict what decisions will be made by the complex system, and most importantly how the system might break. However, when considering any such model it’s important to remember Box’s maxim that “All models are wrong but some are useful.”

We focus on the distinction between these models and explanations in philosophy and sociology. These models can be understood as a “do it yourself kit” for explanations, allowing a practitioner to directly answer “what if questions” or generate contrastive explanations without external assistance. Although a valuable ability, giving these models as explanations appears more difficult than necessary, and other forms of explanation may not have the same trade-offs. We contrast the different schools of thought on what makes an explanation, and suggest that machine learning might benefit from viewing the problem more broadly… (More)”.

Big Data Ethics and Politics: Toward New Understandings


Introductory paper by Wenhong Chen and Anabel Quan-Haase of Special Issue of the Social Science Computer Review:  “The hype around big data does not seem to abate nor do the scandals. Privacy breaches in the collection, use, and sharing of big data have affected all the major tech players, be it Facebook, Google, Apple, or Uber, and go beyond the corporate world including governments, municipalities, and educational and health institutions. What has come to light is that enabled by the rapid growth of social media and mobile apps, various stakeholders collect and use large amounts of data, disregarding the ethics and politics.

As big data touch on many realms of daily life and have profound impacts in the social world, the scrutiny around big data practice becomes increasingly relevant. This special issue investigates the ethics and politics of big data using a wide range of theoretical and methodological approaches. Together, the articles provide new understandings of the many dimensions of big data ethics and politics, showing it is important to understand and increase awareness of the biases and limitations inherent in big data analysis and practices….(More)”

Startup Offers To Sequence Your Genome Free Of Charge, Then Let You Profit From It


Richard Harris at NPR: “A startup genetics company says it’s now offering to sequence your entire genome at no cost to you. In fact, you would own the data and may even be able to make money off it.

Nebula Genomics, created by the prominent Harvard geneticist George Church and his lab colleagues, seeks to upend the usual way genomic information is owned.

Today, companies like 23andMe make some of their money by scanning your genetic patterns and then selling that information to drug companies for use in research. (You choose whether to opt in.)

Church says his new enterprise leaves ownership and control of the data in an individual’s hands. And the genomic analysis Nebula will perform is much more detailed than what 23andMe and similar companies offer.

Nebula will do a full genome sequence, rather than a snapshot of key gene variants. That wider range of genetic information would makes the data more appealing to biologists and biotech and pharmaceutical companies….

Church’s approach is part of a trend that’s pushing back against the multibillion-dollar industry to buy and sell medical information. Right now, companies reap those profits and control the data.

“Patients should have the right to decide for themselves whether they want to share their medical data, and, if so, with whom,” Adam Tanner, at Harvard’s Institute for Quantitative Social Science, says in an email. “Efforts to empower people to fine-tune the fate of their medical information are a step in the right direction.” Tanner, author of a book on the subject of the trade in medical data, isn’t involved in Nebula.

The current system is “very paternalistic,” Church says. He aims to give people complete control over who gets access to their data, and let individuals decide whether or not to sell the information, and to whom.

“In this case, everything is private information, stored on your computer or a computer you designate,” Church says. It can be encrypted so nobody can read it, even you, if that’s what you want.

Drug companies interested in studying, say, diabetes patients would ask Nebula to identify people in their system who have the disease. Nebula would then identify those individuals by launching an encrypted search of participants.

People who have indicated they’re interested in selling their genetic data to a company would then be given the option of granting access to the information, along with medical data that person has designated.

Other companies are also springing up to help people control — and potentially profit from — their medical data. EncrypGen lets people offer up their genetic data, though customers have to provide their own DNA sequence. Hu-manity.co is also trying to establish a system in which people can sell their medical data to pharmaceutical companies….(More)”.

What do we learn from Machine Learning?


Blog by Giovanni Buttarelli: “…There are few authorities monitoring the impact of new technologies on fundamental rights so closely and intensively as data protection and privacy commissioners. At the International Conference of Data Protection and Privacy Commissioners, the 40th ICDPPC (which the EDPS had the honour to host), they continued the discussion on AI which began in Marrakesh two years ago with a reflection paper prepared by EDPS experts. In the meantime, many national data protection authorities have invested considerable efforts and provided important contributions to the discussion. To name only a few, the data protection authorities from NorwayFrance, the UK and Schleswig-Holstein have published research and reflections on AI, ethics and fundamental rights. We all see that some applications of AI raise immediate concerns about data protection and privacy; but it also seems generally accepted that there are far wider-reaching ethical implications, as a group of AI researchers also recently concluded. Data protection and privacy commissioners have now made a forceful intervention by adopting a declaration on ethics and data protection in artificial intelligence which spells out six principles for the future development and use of AI – fairness, accountability, transparency, privacy by design, empowerment and non-discrimination – and demands concerted international efforts  to implement such governance principles. Conference members will contribute to these efforts, including through a new permanent working group on Ethics and Data Protection in Artificial Intelligence.

The ICDPPC was also chosen by an alliance of NGOs and individuals, The Public Voice, as the moment to launch its own Universal Guidelines on Artificial Intelligence (UGAI). The twelve principles laid down in these guidelines extend and complement those of the ICDPPC declaration.

We are only at the beginning of this debate. More voices will be heard: think tanks such as CIPL are coming forward with their suggestions, and so will many other organisations.

At international level, the Council of Europe has invested efforts in assessing the impact of AI, and has announced a report and guidelines to be published soon. The European Commission has appointed an expert group which will, among other tasks, give recommendations on future-related policy development and on ethical, legal and societal issues related to AI, including socio-economic challenges.

As I already pointed out in an earlier blogpost, it is our responsibility to ensure that the technologies which will determine the way we and future generations communicate, work and live together, are developed in such a way that the respect for fundamental rights and the rule of law are supported and not undermined….(More)”.