Stefaan Verhulst
Yves-Alexandre de Montjoye et al in Nature: “The breadcrumbs we leave behind when using our mobile phones—who somebody calls, for how long, and from where—contain unprecedented insights about us and our societies. Researchers have compared the recent availability of large-scale behavioral datasets, such as the ones generated by mobile phones, to the invention of the microscope, giving rise to the new field of computational social science.
With mobile phone penetration rates reaching 90% and under-resourced national statistical agencies, the data generated by our phones—traditional Call Detail Records (CDR) but also high-frequency x-Detail Record (xDR)—have the potential to become a primary data source to tackle crucial humanitarian questions in low- and middle-income countries. For instance, they have already been used to monitor population displacement after disasters, to provide real-time traffic information, and to improve our understanding of the dynamics of infectious diseases. These data are also used by governmental and industry practitioners in high-income countries.
While there is little doubt on the potential of mobile phone data for good, these data contain intimate details of our lives: rich information about our whereabouts, social life, preferences, and potentially even finances. A BCG study showed, e.g., that 60% of Americans consider location data and phone number history—both available in mobile phone data—as “private”.
Historically and legally, the balance between the societal value of statistical data (in aggregate) and the protection of privacy of individuals has been achieved through data anonymization. While hundreds of different anonymization algorithms exist, most of them are variations and improvements of the seminal k-anonymity algorithm introduced in 1998. Recent studies have, however, shown that pseudonymization and standard de-identification are not sufficient to prevent users from being re-identified in mobile phone data. Four data points—approximate places and times where an individual was present—have been shown to be enough to uniquely re-identify them 95% of the time in a mobile phone dataset of 1.5 million people. Furthermore, re-identification estimations using unicity—a metric to evaluate the risk of re-identification in large-scale datasets—and attempts at k-anonymizing mobile phone data ruled out de-identification as sufficient to truly anonymize the data. This was echoed in the recent report of the [US] President’s Council of Advisors on Science and Technology on Big Data Privacy which consider de-identification to be useful as an “added safeguard, but [emphasized that] it is not robust against near-term future re-identification methods”.
The limits of the historical de-identification framework to adequately balance risks and benefits in the use of mobile phone data are a major hindrance to their use by researchers, development practitioners, humanitarian workers, and companies. This became particularly clear at the height of the Ebola crisis, when qualified researchers (including some of us) were prevented from accessing relevant mobile phone data on time despite efforts by mobile phone operators, the GSMA, and UN agencies, with privacy being cited as one of the main concerns.
These privacy concerns are, in our opinion, due to the failures of the traditional de-identification model and the lack of a modern and agreed upon framework for the privacy-conscientious use of mobile phone data by third-parties especially in the context of the EU General Data Protection Regulation (GDPR). Such frameworks have been developed for the anonymous use of other sensitive data such as census, household survey, and tax data. The positive societal impact of making these data accessible and the technical means available to protect people’s identity have been considered and a trade-off, albeit far from perfect, has been agreed on and implemented. This has allowed the data to be used in aggregate for the benefit of society. Such thinking and an agreed upon set of models has been missing so far for mobile phone data. This has left data protection authorities, mobile phone operators, and data users with little guidance on technically sound yet reasonable models for the privacy-conscientious use of mobile phone data. This has often resulted in suboptimal tradeoffs if any.
In this paper, we propose four models for the privacy-conscientious use of mobile phone data (Fig. 1). All of these models 1) focus on a use of mobile phone data in which only statistical, aggregate information is ultimately needed by a third-party and, while this needs to be confirmed on a per-country basis, 2) are designed to fall under the legal umbrella of “anonymous use of the data”. Examples of cases in which only statistical aggregated information is ultimately needed by the third-party are discussed below. They would include, e.g., disaster management, mobility analysis, or the training of AI algorithms in which only aggregate information on people’s mobility is ultimately needed by agencies, and exclude cases in which individual-level identifiable information is needed such as targeted advertising or loans based on behavioral data.

First, it is important to insist that none of these models is a silver bullet…(More)”.
M. Poblet at First Monday: “This paper examines new civic engagement practices unfolding during the 2017 referendum on independence in Catalonia. These practices constitute one of the first signs of some emerging trends in the use of the Internet for civic and political action: the adoption of horizontal, distributed, and privacy-enhancing technologies that rely on P2P networks and advanced cryptographic tools. In this regard, the case of the 2017 Catalan referendum, framed within conflicting political dynamics, can be considered a first-of-its kind in participatory democracy. The case also offers an opportunity to reflect on an interesting paradox that twenty-first century activism will face: the more it will rely on private-friendly, secured, and encrypted networks, the more open, inclusive, ethical, and transparent it will need to be….(More)”.
Paper by Martha Finnemore: “This essay steps back from the more detailed regulatory discussions in other contributions to this roundtable on “Competing Visions for Cyberspace” and highlights three broad issues that raise ethical concerns about our activity online. First, the commodification of people—their identities, their data, their privacy—that lies at the heart of business models of many of the largest information and communication technologies companies risks instrumentalizing human beings. Second, concentrations of wealth and market power online may be contributing to economic inequalities and other forms of domination. Third, long-standing tensions between the security of states and the human security of people in those states have not been at all resolved online and deserve attention….(More)”.
Report by K.Zuegel, E. Cantera, and A. Bellantoni: “Ombudsman institutions (OIs) act as the guardians of citizens’ rights and as a mediator between citizens and the public administration. While the very existence of such institutions is rooted in the notion of open government, the role they can play in promoting openness throughout the public administration has not been adequately recognized or exploited. Based on a survey of 94 OIs, this report examines the role they play in open government policies and practices. It also provides recommendations on how, given their privileged contact with both people and governments, OIs can better promote transparency, integrity, accountability, and stakeholder participation; how their role in national open government strategies and initiatives can be strengthened; and how they can be at the heart of a truly open state….(More)”.
Paper by Daniel Kondor, Behrooz Hashemian, Yves-Alexandre de Montjoye and Carlo Ratti: “The problem of unicity and reidentifiability of records in large-scale databases has been studied in different contexts and approaches, with focus on preserving privacy or matching records from different data sources. With an increasing number of service providers nowadays routinely collecting location traces of their users on unprecedented scales, there is a pronounced interest in the possibility of matching records and datasets based on spatial trajectories. Extending previous work on reidentifiability of spatial data and trajectory matching, we present the first large-scale analysis of user matchability in real mobility datasets on realistic scales, i.e. among two datasets that consist of several million people’s mobility traces, coming from a mobile network operator and transportation smart card usage. We extract the relevant statistical properties which influence the matching process and analyze their impact on the matchability of users. We show that for individuals with typical activity in the transportation system (those making 3-4 trips per day on average), a matching algorithm based on the co-occurrence of their activities is expected to achieve a 16.8% success only after a one-week long observation of their mobility traces, and over 55% after four weeks. We show that the main determinant of matchability is the expected number of co-occurring records in the two datasets. Finally, we discuss different scenarios in terms of data collection frequency and give estimates of matchability over time. We show that with higher frequency data collection becoming more common, we can expect much higher success rates in even shorter intervals….(More)”.
Book by Colin Mayer: “What is a business for? On day one of an economics course a new student is taught the answer: to maximize shareholder profit. But this single idea that pervades all our thinking about the role of the corporation, is fundamentally wrong, argues Colin Mayer. Constraining the firm to a single narrow objective has had wide-ranging and damaging consequences; economic, environmental, political, and social.
Prosperity challenges the fundamentals of business thinking. It also sets out a positive new agenda, demonstrating that the corporation is in a unique and powerful position to promote economic and social wellbeing in its fullest sense, for customers, for future generations, as well as for shareholders.
Professor and former Dean of the Saïd Business School in Oxford, Mayer is a leading figure in the global discussion about the purpose and role of the corporation. In Prosperity, he presents a radical and carefully considered agenda for corporations themselves, and for the regulatory frameworks that will enable them to do this. Drawing together insights from business, law, and economics, science, philosophy, and history, he shows how the corporation can realize its full potential to contribute to the economic and social wellbeing of the many, not just the few.
Prosperity is as much a discussion of how to create and run successful businesses as it is a guide to policymaking to fix the broken system….(More)”.
Book by Nigel Shadbolt, David De Roure, Kieron O’Hara and Wendy Hall: “Social machines are a type of network connected by interactive digital devices made possible by the ubiquitous adoption of technologies such as the Internet, the smartphone, social media and the read/write World Wide Web, connecting people at scale to document situations, cooperate on tasks, exchange information, or even simply to play. Existing social processes may be scaled up, and new social processes enabled, to solve problems, augment reality, create new sources of value, and disrupt existing practice.
This book considers what talents one would need to understand or build a social machine, describes the state of the art, and speculates on the future, from the perspective of the EPSRC project SOCIAM – The Theory and Practice of Social Machines. The aim is to develop a set of tools and techniques for investigating, constructing and facilitating social machines, to enable us to narrow down pragmatically what is becoming a wide space, by asking ‘when will it be valuable to use these methods on a sociotechnical system?’ The systems for which the use of these methods adds value are social machines in which there is rich person-to-person communication, and where a large proportion of the machine’s behaviour is constituted by human interaction….(More)”.
Reform: This report looks at the access and use of NHS data by private sector companies for research or product and service development purposes….
The private sector is an important partner to the NHS and plays a crucial role in the development of healthcare technologies that use data collected by hospitals or GP practices. It provides the skills and know-how to develop data-driven tools which can be used to improve patient care. However, this is not a one-sided exchange as the NHS makes the data available to build these tools and offers medical expertise to make sense of the data. This is known as the “value exchange”. Our research uncovered that there is a lack of clarity over what a fair value exchange looks like. This lack of clarity in conjunction with the lack of national guidance on the types of partnerships that could be developed has led to a patchwork on the ground….
Knowing what the “value exchange” is between patients, the NHS and industry allows for a more informed conversation about what constitutes a fair partnership when there is access to data to create a product or service
WHAT NEEDS TO CHANGE?
- Engage with the public
- A national strategy
- Access to good quality data
- Commercial and legal skills…(More)
Paper by Jeff Kosseff: “U.S. cybersecurity law is largely an outgrowth of the early-aughts concerns over identity theft and financial fraud. Cybersecurity laws focus on protecting identifiers such as driver’s licenses and social security numbers, and financial data such as credit card numbers. Federal and state laws require companies to protect this data and notify individuals when it is breached, and impose civil and criminal liability on hackers who steal or damage this data. In this paper, I argue that our current cybersecurity laws are too narrowly focused on financial harms. While such concerns remain valid, they are only one part of the cybersecurity challenge that our nation faces.
Too often overlooked by the cybersecurity profession are the harms to individuals, such as revenge pornography and online harassment. Our legal system typically addresses these harms through retrospective criminal prosecution and civil litigation, both of which face significant limits. Accounting for such harms in our conception of cybersecurity will help to better align our laws with these threats and reduce the likelihood of the harms occurring….(More)”,
Interview with Hannah Fry on the promise and danger of an AI world by Michael Segal:”…Why do we need an FDA for algorithms?
It used to be the case that you could just put any old colored liquid in a glass bottle and sell it as medicine and make an absolute fortune. And then not worry about whether or not it’s poisonous. We stopped that from happening because, well, for starters it’s kind of morally repugnant. But also, it harms people. We’re in that position right now with data and algorithms. You can harvest any data that you want, on anybody. You can infer any data that you like, and you can use it to manipulate them in any way that you choose. And you can roll out an algorithm that genuinely makes massive differences to people’s lives, both good and bad, without any checks and balances. To me that seems completely bonkers. So I think we need something like the FDA for algorithms. A regulatory body that can protect the intellectual property of algorithms, but at the same time ensure that the benefits to society outweigh the harms.
Why is the regulation of medicine an appropriate comparison?
If you swallow a bottle of colored liquid and then you keel over the next day, then you know for sure it was poisonous. But there are much more subtle things in pharmaceuticals that require expert analysis to be able to weigh up the benefits and the harms. To study the chemical profile of these drugs that are being sold and make sure that they actually are doing what they say they’re doing. With algorithms it’s the same thing. You can’t expect the average person in the street to study Bayesian inference or be totally well read in random forests, and have the kind of computing prowess to look up a code and analyze whether it’s doing something fairly. That’s not realistic. Simultaneously, you can’t have some code of conduct that every data science person signs up to, and agrees that they won’t tread over some lines. It has to be a government, really, that does this. It has to be government that analyzes this stuff on our behalf and makes sure that it is doing what it says it does, and in a way that doesn’t end up harming people.
How did you come to write a book about algorithms?
Back in 2011 in London, we had these really bad riots in London. I’d been working on a project with the Metropolitan Police, trying mathematically to look at how these riots had spread and to use algorithms to ask how could the police have done better. I went to go and give a talk in Berlin about this paper we’d published about our work, and they completely tore me apart. They were asking questions like, “Hang on a second, you’re creating this algorithm that has the potential to be used to suppress peaceful demonstrations in the future. How can you morally justify the work that you’re doing?” I’m kind of ashamed to say that it just hadn’t occurred to me at that point in time. Ever since, I have really thought a lot about the point that they made. And started to notice around me that other researchers in the area weren’t necessarily treating the data that they were working with, and the algorithms that they were creating, with the ethical concern they really warranted. We have this imbalance where the people who are making algorithms aren’t talking to the people who are using them. And the people who are using them aren’t talking to the people who are having decisions made about their lives by them. I wanted to write something that united those three groups….(More)”.