On the privacy-conscientious use of mobile phone data

Yves-Alexandre de Montjoye et al in Nature: “The breadcrumbs we leave behind when using our mobile phones—who somebody calls, for how long, and from where—contain unprecedented insights about us and our societies. Researchers have compared the recent availability of large-scale behavioral datasets, such as the ones generated by mobile phones, to the invention of the microscope, giving rise to the new field of computational social science.

With mobile phone penetration rates reaching 90% and under-resourced national statistical agencies, the data generated by our phones—traditional Call Detail Records (CDR) but also high-frequency x-Detail Record (xDR)—have the potential to become a primary data source to tackle crucial humanitarian questions in low- and middle-income countries. For instance, they have already been used to monitor population displacement after disasters, to provide real-time traffic information, and to improve our understanding of the dynamics of infectious diseases. These data are also used by governmental and industry practitioners in high-income countries.

While there is little doubt on the potential of mobile phone data for good, these data contain intimate details of our lives: rich information about our whereabouts, social life, preferences, and potentially even finances. A BCG study showed, e.g., that 60% of Americans consider location data and phone number history—both available in mobile phone data—as “private”.

Historically and legally, the balance between the societal value of statistical data (in aggregate) and the protection of privacy of individuals has been achieved through data anonymization. While hundreds of different anonymization algorithms exist, most of them are variations and improvements of the seminal k-anonymity algorithm introduced in 1998. Recent studies have, however, shown that pseudonymization and standard de-identification are not sufficient to prevent users from being re-identified in mobile phone data. Four data points—approximate places and times where an individual was present—have been shown to be enough to uniquely re-identify them 95% of the time in a mobile phone dataset of 1.5 million people. Furthermore, re-identification estimations using unicity—a metric to evaluate the risk of re-identification in large-scale datasets—and attempts at k-anonymizing mobile phone data ruled out de-identification as sufficient to truly anonymize the data. This was echoed in the recent report of the [US] President’s Council of Advisors on Science and Technology on Big Data Privacy which consider de-identification to be useful as an “added safeguard, but [emphasized that] it is not robust against near-term future re-identification methods”.

The limits of the historical de-identification framework to adequately balance risks and benefits in the use of mobile phone data are a major hindrance to their use by researchers, development practitioners, humanitarian workers, and companies. This became particularly clear at the height of the Ebola crisis, when qualified researchers (including some of us) were prevented from accessing relevant mobile phone data on time despite efforts by mobile phone operators, the GSMA, and UN agencies, with privacy being cited as one of the main concerns.

These privacy concerns are, in our opinion, due to the failures of the traditional de-identification model and the lack of a modern and agreed upon framework for the privacy-conscientious use of mobile phone data by third-parties especially in the context of the EU General Data Protection Regulation (GDPR). Such frameworks have been developed for the anonymous use of other sensitive data such as census, household survey, and tax data. The positive societal impact of making these data accessible and the technical means available to protect people’s identity have been considered and a trade-off, albeit far from perfect, has been agreed on and implemented. This has allowed the data to be used in aggregate for the benefit of society. Such thinking and an agreed upon set of models has been missing so far for mobile phone data. This has left data protection authorities, mobile phone operators, and data users with little guidance on technically sound yet reasonable models for the privacy-conscientious use of mobile phone data. This has often resulted in suboptimal tradeoffs if any.

In this paper, we propose four models for the privacy-conscientious use of mobile phone data (Fig. 1). All of these models 1) focus on a use of mobile phone data in which only statistical, aggregate information is ultimately needed by a third-party and, while this needs to be confirmed on a per-country basis, 2) are designed to fall under the legal umbrella of “anonymous use of the data”. Examples of cases in which only statistical aggregated information is ultimately needed by the third-party are discussed below. They would include, e.g., disaster management, mobility analysis, or the training of AI algorithms in which only aggregate information on people’s mobility is ultimately needed by agencies, and exclude cases in which individual-level identifiable information is needed such as targeted advertising or loans based on behavioral data.

Figure 1
Figure 1: Matrix of the four models for the privacy-conscientious use of mobile phone data.

First, it is important to insist that none of these models is a silver bullet…(More)”.

Distributed, privacy-enhancing technologies in the 2017 Catalan referendum on independence: New tactics and models of participatory democracy

M. Poblet at First Monday: “This paper examines new civic engagement practices unfolding during the 2017 referendum on independence in Catalonia. These practices constitute one of the first signs of some emerging trends in the use of the Internet for civic and political action: the adoption of horizontal, distributed, and privacy-enhancing technologies that rely on P2P networks and advanced cryptographic tools. In this regard, the case of the 2017 Catalan referendum, framed within conflicting political dynamics, can be considered a first-of-its kind in participatory democracy. The case also offers an opportunity to reflect on an interesting paradox that twenty-first century activism will face: the more it will rely on private-friendly, secured, and encrypted networks, the more open, inclusive, ethical, and transparent it will need to be….(More)”.

Ethical Dilemmas in Cyberspace

Paper by Martha Finnemore: “This essay steps back from the more detailed regulatory discussions in other contributions to this roundtable on “Competing Visions for Cyberspace” and highlights three broad issues that raise ethical concerns about our activity online. First, the commodification of people—their identities, their data, their privacy—that lies at the heart of business models of many of the largest information and communication technologies companies risks instrumentalizing human beings. Second, concentrations of wealth and market power online may be contributing to economic inequalities and other forms of domination. Third, long-standing tensions between the security of states and the human security of people in those states have not been at all resolved online and deserve attention….(More)”.

Towards matching user mobility traces in large-scale datasets

Paper by Daniel Kondor, Behrooz Hashemian,  Yves-Alexandre de Montjoye and Carlo Ratti: “The problem of unicity and reidentifiability of records in large-scale databases has been studied in different contexts and approaches, with focus on preserving privacy or matching records from different data sources. With an increasing number of service providers nowadays routinely collecting location traces of their users on unprecedented scales, there is a pronounced interest in the possibility of matching records and datasets based on spatial trajectories. Extending previous work on reidentifiability of spatial data and trajectory matching, we present the first large-scale analysis of user matchability in real mobility datasets on realistic scales, i.e. among two datasets that consist of several million people’s mobility traces, coming from a mobile network operator and transportation smart card usage. We extract the relevant statistical properties which influence the matching process and analyze their impact on the matchability of users. We show that for individuals with typical activity in the transportation system (those making 3-4 trips per day on average), a matching algorithm based on the co-occurrence of their activities is expected to achieve a 16.8% success only after a one-week long observation of their mobility traces, and over 55% after four weeks. We show that the main determinant of matchability is the expected number of co-occurring records in the two datasets. Finally, we discuss different scenarios in terms of data collection frequency and give estimates of matchability over time. We show that with higher frequency data collection becoming more common, we can expect much higher success rates in even shorter intervals….(More)”.

Cybersecurity of the Person

Paper by Jeff Kosseff: “U.S. cybersecurity law is largely an outgrowth of the early-aughts concerns over identity theft and financial fraud. Cybersecurity laws focus on protecting identifiers such as driver’s licenses and social security numbers, and financial data such as credit card numbers. Federal and state laws require companies to protect this data and notify individuals when it is breached, and impose civil and criminal liability on hackers who steal or damage this data. In this paper, I argue that our current cybersecurity laws are too narrowly focused on financial harms. While such concerns remain valid, they are only one part of the cybersecurity challenge that our nation faces.

Too often overlooked by the cybersecurity profession are the harms to individuals, such as revenge pornography and online harassment. Our legal system typically addresses these harms through retrospective criminal prosecution and civil litigation, both of which face significant limits. Accounting for such harms in our conception of cybersecurity will help to better align our laws with these threats and reduce the likelihood of the harms occurring….(More)”,

Using insights from behavioral economics to nudge individuals towards healthier choices when eating out

Paper by Stéphane Bergeron, Maurice Doyon, Laure Saulais and JoAnne Labrecque: “Using a controlled experiment in a restaurant with naturally occurring clients, this study investigates how nudging can be used to design menus that guide consumers to make healthier choices. It examines the use of default options, focusing specifically on two types of defaults that can be found when ordering food in a restaurant: automatic and standard defaults. Both types of defaults significantly affected choices, but did not adversely impact the satisfaction of individual choices. The results suggest that menu design could effectively use non-informational strategies such as nudging to promote healthier individual choices without restricting the offer or reducing satisfaction….(More)”.

When a Nudge Backfires: Using Observation with Social and Economic Incentives to Promote Pro-Social Behavior

Paper by Gary Bolton, Eugen Dimant and Ulrich Schmidt: “Both theory and recent empirical evidence on nudging suggests that observability of behavior acts as an instrument for promoting (discouraging) pro-social (anti-social) behavior.

Our study questions the universality of these claims. We employ a novel four-party setup to disentangle the roles three observational mechanisms play in mediating behavior. We systematically vary the observability of one’s actions by others as well as the (non-)monetary relationship between observer and observee. Observability involving economic incentives
crowds-out anti-social behavior in favor of more pro-social behavior.

Surprisingly, social observation without economic incentives fails to achieve any aggregate pro-social effect, and if anything it backfires. Additional experiments confirm that observability without additional monetary incentives can indeed backfire. However, they also show that the effect of observability on pro-social behavior is increased when social norms are made salient….(More)”.

Motivating Participation in Crowdsourced Policymaking: The Interplay of Epistemic and Interactive Aspects

Paper by Tanja Aitamurto and Jorge Saldivar in Proceedings of ACM Human-Computer Interaction (CSCW ’18):  “…we examine the changes in motivation factors in crowdsourced policymaking. By drawing on longitudinal data from a crowdsourced law reform, we show that people participated because they wanted to improve the law, learn, and solve problems. When crowdsourcing reached a saturation point, the motivation factors weakened and the crowd disengaged. Learning was the only factor that did not weaken. The participants learned while interacting with others, and the more actively the participants commented, the more likely they stayed engaged. Crowdsourced policymaking should thus be designed to support both epistemic and interactive aspects. While the crowd’s motives were rooted in self-interest, their knowledge perspective showed common-good orientation, implying that rather than being dichotomous, motivation factors move on a continuum. The design of crowdsourced policymaking should support the dynamic nature of the process and the motivation factors driving it….(More)”.

Fostering innovation in public procurement through public private partnerships

Paper by Nunzia Carbonara in the Journal of Public Procurement: “The prevailing view in the studies on Public Private Partnerships (PPPs) is that PPPs can improve the quality and efficiency of infrastructure services and facilitates innovation in infrastructure developments. Although researchers highlight the potentiality of PPP models for stimulating innovation, they do not prove whether and in which conditions the PPP model is capable of developing innovative solutions. This paper aims to provide answers to the following key research questions: Which are the PPP features that favor innovation? How properly structure a PPP to foster innovation?

With this aim, drawing upon the main streams of studies on innovation, the authors develop a conceptual framework that identifies the PPP features that can influence the innovativeness. Second, they define how these PPP features have to be structured to foster innovation.

The authors find that a wider involvement of the private sector will increase the level of innovation. The industry structure exerts opposite forces on innovation: the dominance of large-sized firms is positively related to innovative output, whereas the market concentration negatively affects innovation. Performance-based contracts should be used in the context of PPP instead of traditional contracts. Finally, the authors find that, to fully exploit the networking effects on innovation, cooperation and trusting among partners involved in PPPs should be enhanced….(More)”.

Open Government Data for Inclusive Development

Chapter by F. van Schalkwyk and M,  Cañares in  “Making Open Development Inclusive”, MIT Press by Matthew L. Smith and Ruhiya Kris Seward (Eds):  “This chapter examines the relationship between open government data and social inclusion. Twenty-eight open data initiatives from the Global South are analyzed to find out how and in what contexts the publication of open government data tend to result in the inclusion of habitually marginalized communities in governance processes such that they may lead better lives.

The relationship between open government data and social inclusion is examined by presenting an analysis of the outcomes of open data projects. This analysis is based on a constellation of factors that were identified as having a bearing on open data initiatives with respect to inclusion. The findings indicate that open data can contribute to an increase in access and participation— both components of inclusion. In these cases, this particular finding indicates that a more open, participatory approach to governance practice is taking root. However, the findings also show that access and participation approaches to open government data have, in the cases studied here, not successfully disrupted the concentration of power in political and other networks, and this has placed limits on open data’s contribution to a more inclusive society.

The chapter starts by presenting a theoretical framework for the analysis of the relationship between open data and inclusion. The framework sets out the complex relationship between social actors, information and power in the network society. This is critical, we suggest, in developing a realistic analysis of the contexts in which open data activates its potential for
transformation. The chapter then articulates the research question and presents the methodology used to operationalize those questions. The findings and discussion section that follows examines the factors affecting the relationship between open data and inclusion, and how these factors
are observed to play out across several open data initiatives in different contexts. The chapter ends with concluding remarks and an attempt to synthesize the insights that emerged in the preceding sections….(More)”.