On the privacy-conscientious use of mobile phone data

Yves-Alexandre de Montjoye et al in Nature: “The breadcrumbs we leave behind when using our mobile phones—who somebody calls, for how long, and from where—contain unprecedented insights about us and our societies. Researchers have compared the recent availability of large-scale behavioral datasets, such as the ones generated by mobile phones, to the invention of the microscope, giving rise to the new field of computational social science.

With mobile phone penetration rates reaching 90% and under-resourced national statistical agencies, the data generated by our phones—traditional Call Detail Records (CDR) but also high-frequency x-Detail Record (xDR)—have the potential to become a primary data source to tackle crucial humanitarian questions in low- and middle-income countries. For instance, they have already been used to monitor population displacement after disasters, to provide real-time traffic information, and to improve our understanding of the dynamics of infectious diseases. These data are also used by governmental and industry practitioners in high-income countries.

While there is little doubt on the potential of mobile phone data for good, these data contain intimate details of our lives: rich information about our whereabouts, social life, preferences, and potentially even finances. A BCG study showed, e.g., that 60% of Americans consider location data and phone number history—both available in mobile phone data—as “private”.

Historically and legally, the balance between the societal value of statistical data (in aggregate) and the protection of privacy of individuals has been achieved through data anonymization. While hundreds of different anonymization algorithms exist, most of them are variations and improvements of the seminal k-anonymity algorithm introduced in 1998. Recent studies have, however, shown that pseudonymization and standard de-identification are not sufficient to prevent users from being re-identified in mobile phone data. Four data points—approximate places and times where an individual was present—have been shown to be enough to uniquely re-identify them 95% of the time in a mobile phone dataset of 1.5 million people. Furthermore, re-identification estimations using unicity—a metric to evaluate the risk of re-identification in large-scale datasets—and attempts at k-anonymizing mobile phone data ruled out de-identification as sufficient to truly anonymize the data. This was echoed in the recent report of the [US] President’s Council of Advisors on Science and Technology on Big Data Privacy which consider de-identification to be useful as an “added safeguard, but [emphasized that] it is not robust against near-term future re-identification methods”.

The limits of the historical de-identification framework to adequately balance risks and benefits in the use of mobile phone data are a major hindrance to their use by researchers, development practitioners, humanitarian workers, and companies. This became particularly clear at the height of the Ebola crisis, when qualified researchers (including some of us) were prevented from accessing relevant mobile phone data on time despite efforts by mobile phone operators, the GSMA, and UN agencies, with privacy being cited as one of the main concerns.

These privacy concerns are, in our opinion, due to the failures of the traditional de-identification model and the lack of a modern and agreed upon framework for the privacy-conscientious use of mobile phone data by third-parties especially in the context of the EU General Data Protection Regulation (GDPR). Such frameworks have been developed for the anonymous use of other sensitive data such as census, household survey, and tax data. The positive societal impact of making these data accessible and the technical means available to protect people’s identity have been considered and a trade-off, albeit far from perfect, has been agreed on and implemented. This has allowed the data to be used in aggregate for the benefit of society. Such thinking and an agreed upon set of models has been missing so far for mobile phone data. This has left data protection authorities, mobile phone operators, and data users with little guidance on technically sound yet reasonable models for the privacy-conscientious use of mobile phone data. This has often resulted in suboptimal tradeoffs if any.

In this paper, we propose four models for the privacy-conscientious use of mobile phone data (Fig. 1). All of these models 1) focus on a use of mobile phone data in which only statistical, aggregate information is ultimately needed by a third-party and, while this needs to be confirmed on a per-country basis, 2) are designed to fall under the legal umbrella of “anonymous use of the data”. Examples of cases in which only statistical aggregated information is ultimately needed by the third-party are discussed below. They would include, e.g., disaster management, mobility analysis, or the training of AI algorithms in which only aggregate information on people’s mobility is ultimately needed by agencies, and exclude cases in which individual-level identifiable information is needed such as targeted advertising or loans based on behavioral data.

Figure 1
Figure 1: Matrix of the four models for the privacy-conscientious use of mobile phone data.

First, it is important to insist that none of these models is a silver bullet…(More)”.

Data Flow in the Smart City: Open Data Versus the Commons

Chapter by Richard Beckwith, John Sherry and David Prendergast in The Hackable City: “Much of the recent excitement around data, especially ‘Big Data,’ focuses on the potential commercial or economic value of data. How that data will affect people isn’t much discussed. People know that smart cities will deploy Internet-based monitoring and that flows of the collected data promise to produce new values. Less considered is that smart cities will be sites of new forms of citizen action—enabled by an ‘economy’ of data that will lead to new methods of collectivization, accountability, and control which, themselves, can provide both positive and negative values to the citizenry. Therefore, smart city design needs to consider not just measurement and publication of data but also the implications of city-wide deployment, data openness, and the possibility of unintended consequences if data leave the city….(More)”.

Data Collaboration, Pooling and Hoarding under Competition Law

Paper by Bjorn Lundqvist: “In the Internet of Things era devices will monitor and collect data, whilst device producing firms will store, distribute, analyse and re-use data on a grand scale. Great deal of data analytics will be used to enable firms to understand and make use of the collected data. The infrastructure around the collected data is controlled and access to the data flow is thus restricted on technical, but also on legal grounds. Legally, the data are being obscured behind a thicket of property rights, including intellectual property rights. Therefore, there is no general “data commons” for everyone to enjoy.

If firms would like to combine data, they need to give each other access either by sharing, trading, or pooling the data. On the one hand, industry-wide pooling of data could increase efficiency of certain services, and contribute to the innovation of other services, e.g., think about self-driven cars or personalized medicine. On the other hand, firms combining business data may use the data, not to advance their services or products, but to collude, to exclude competitors or to abuse their market position. Indeed by combining their data in a pool, they can gain market power, and, hence, the ability to violate competition law. Moreover, we also see firms hoarding data from various source creating de facto data pools. This article will discuss what implications combining data in data pools by firms might have on competition, and when competition law should be applicable. It develops the idea that data pools harbour great opportunities, whilst acknowledging that there are still risks to take into consideration, and to regulate….(More)”.

Using Mobile Network Data for Development: How it works

Blog by Derval Usher and Darren Hanniffy: “…We aim to equip decision makers with data tools so that they have access to the analysis on the fly. But to help this scale we need progress in three areas:

1. The framework to support Shared Value partnerships.

2. Shared understanding of The Proposition and the benefits for all parties.

3. Access to finance and a funding strategy, designing-in innovation.

1. Any Public-Private Partnership should be aligned to achieve impact centered on the SDGs through a Shared Value / Inclusive Business approach. Mobile network operators are consumed with the challenge of maintaining or upgrading their infrastructure, driving device sales and sustaining their agent networks to reach the last mile. Measuring impact against the SDGs has not been a priority. Mobile network operators tend not to seek out partnerships with traditional development donors or development implementers. But there is a growing realisation of the potential and the need to partner. It’s important to move from a service level transactional relationship to a strategic partnership approach.

Private sector partners have been fundamental to the success of UN Global Pulse as these companies are often the custodians of the big data sets from which we develop valuable development and humanitarian insights. Although in previous years our private sector partners were framed primarily as data philanthropists, we are beginning to see a shift in the relationship to one of shared value. Our work generates public value and also insights that can enhance business operations. This shared value model is attracting more private enterprises to engage and to explore their own data, and more broadly to investigate the value of their networks and data as part of the data innovation ecosystem, which the Global Pulse lab network will build on as we move forward.

2. Partners need to be more propositional and less charitable. They need to recognise the fact that earning profit may help ensure the sustainability of digital platforms and services that offer developmental impact. Through partnership we can attract innovative finance, deliver mobile for development programmes, measure impact and create affordable commercial solutions to development challenges that become sustainable by design. Pulse Lab Jakarta and Digicel have been flexible with one another which is important as this partnership has not always been a priority for either side all the time. But we believe in unlocking the power of mobile data for development and therefore continue to make progress.

3. Development and commercial strategies should be more aligned to create an enabling environment. Currently they are not. Private sector needs to become a strategic partner to development where multi-annual development funds align with commercial strategy. Mobile network operators continue to invest in their network particularly in developing countries and the digital platform is coming into being in the markets where Digicel operates. But the platform is new and experience is limited within governments, the development community and indeed even within mobile network operators.

We need to see donors actively engage during the development of multi-annual funding facilities….(More)”.

Reimagining Public-Private Partnerships: Four Shifts and Innovations in Sharing and Leveraging Private Assets and Expertise for the Public Good

Blog by Stefaan G. Verhulst and Andrew J. Zahuranec: “For years, public-private partnerships (PPPs) have promised to help governments do more for less. Yet, the discussion and experimentation surrounding PPPs often focus on outdated models and narratives, and the field of experimentation has not fully embraced the opportunities provided by an increasingly networked and data-rich private sector.

Private-sector actors (including businesses and NGOs) have expertise and assets that, if brought to bear in collaboration with the public sector, could spur progress in addressing public problems or providing public services. Challenges to date have largely involved the identification of effective and legitimate means for unlocking the public value of private-sector expertise and assets. Those interested in creating public value through PPPs are faced with a number of questions, including:

  • How do we broaden and deepen our understanding of PPPs in the 21st Century?
  • How can we innovate and improve the ways that PPPs tap into private-sector assets and expertise for the public good?
  • How do we connect actors in the PPP space with open governance developments and practices, especially given that PPPs have not played a major role in the governance innovation space to date?

The PPP Knowledge Lab defines a PPP as a “long-term contract between a private party and a government entity, for providing a public asset or service, in which the private party bears significant risk and management responsibility and remuneration is linked to performance.”…

To maximize the value of PPPs, we don’t just need new tools or experiments but new models for using assets and expertise in different sectors. We need to bring that capacity to public problems.

At the latest convening of the MacArthur Foundation Research Network on Opening Governance, Network members and experts from across the field tried to chart this new course by exploring questions about the future of PPPs.

The group explored the new research and thinking that enables many new types of collaboration beyond the typical “contract” based approaches. Through their discussions, Network members identified four shifts representing ways that cross-sector collaboration could evolve in the future:

  1. From Formal to Informal Trust Mechanisms;
  2. From Selection to Iterative and Inclusive Curation;
  3. From Partnership to Platform; and
  4. From Shared Risk to Shared Outcome….(More)”.
Screen Shot 2018-11-09 at 6.07.40 PM

All Data Are Local: Thinking Critically in a Data-Driven Society

Book by  Yanni Alexander Loukissas: “In our data-driven society, it is too easy to assume the transparency of data. Instead, Yanni Loukissas argues in All Data Are Local, we should approach data sets with an awareness that data are created by humans and their dutiful machines, at a time, in a place, with the instruments at hand, for audiences that are conditioned to receive them. All data are local. The term data set implies something discrete, complete, and portable, but it is none of those things. Examining a series of data sources important for understanding the state of public life in the United States—Harvard’s Arnold Arboretum, the Digital Public Library of America, UCLA’s Television News Archive, and the real estate marketplace Zillow—Loukissas shows us how to analyze data settings rather than data sets.

Loukissas sets out six principles: all data are local; data have complex attachments to place; data are collected from heterogeneous sources; data and algorithms are inextricably entangled; interfaces recontextualize data; and data are indexes to local knowledge. He then provides a set of practical guidelines to follow. To make his argument, Loukissas employs a combination of qualitative research on data cultures and exploratory data visualizations. Rebutting the “myth of digital universalism,” Loukissas reminds us of the meaning-making power of the local….(More)”.

These patients are sharing their data to improve healthcare standards

Article by John McKenna: “We’ve all heard about donating blood, but how about donating data?

Chronic non-communicable diseases (NCDs) like diabetes, heart disease and epilepsy are predicted by the World Health Organization to account for 57% of all disease by 2020.

Heart disease and stroke are the world’s biggest killers.

This has led some experts to call NCDs the “greatest challenge to global health”.

Could data provide the answer?

Today over 600,000 patients from around the world share data on more than 2,800 chronic diseases to improve research and treatment of their conditions.

People who join the PatientsLikeMe online community share information on everything from their medication and treatment plans to their emotional struggles.

Many of the participants say that it is hugely beneficial just to know there is someone else out there going through similar experiences.

But through its use of data, the platform also has the potential for far more wide-ranging benefits to help improve the quality of life for patients with chronic conditions.

Give data, get data

PatientsLikeMe is one of a swathe of emerging data platforms in the healthcare sector helping provide a range of tech solutions to health problems, including speeding up the process of clinical trials using Real Time Data Analysis or using blockchain to enable the secure sharing of patient data.

Its philosophy is “give data, get data”. In practice it means that every patient using the website has access to an array of crowd-sourced information from the wider community, such as common medication side-effects, and patterns in sufferers’ symptoms and behaviour….(More)”.

Waze-fed AI platform helps Las Vegas cut car crashes by almost 20%

Liam Tung at ZDNet: “An AI-led, road-safety pilot program between analytics firm Waycare and Nevada transportation agencies has helped reduce crashes along the busy I-15 in Las Vegas.

The Silicon Valley Waycare system uses data from connected cars, road cameras and apps like Waze to build an overview of a city’s roads and then shares that data with local authorities to improve road safety.

Waycare struck a deal with Google-owned Waze earlier this year to “enable cities to communicate back with drivers and warn of dangerous roads, hazards, and incidents ahead”. Waze’s crowdsourced data also feeds into Waycare’s traffic management system, offering more data for cities to manage traffic.

Waycare has now wrapped up a year-long pilot with the Regional Transportation Commission of Southern Nevada (RTC), Nevada Highway Patrol (NHP), and the Nevada Department of Transportation (NDOT).

RTC reports that Waycare helped the city reduce the number of primary crashes by 17 percent along the Interstate 15 Las Vegas.

Waycare’s data, as well as its predictive analytics, gave the city’s safety and traffic management agencies the ability to take preventative measures in high risk areas….(More)”.

Using Data to Raise the Voices of Working Americans

Ida Rademacher at the Aspen Institute: “…At the Aspen Institute Financial Security Program, we sense a growing need to ground these numbers in what people experience day-to-day. We’re inspired by projects like the Financial Diaries that helped create empathy for what the statistics mean. …the Diaries was a time-delimited project, and the insights we can gain from major banking institutions are somewhat limited in their ability to show the challenges of economically marginalized populations. That’s why we’ve recently launched a consumer insights initiative to develop and translate a more broadly sourced set of data that lifts the curtain on the financial lives of low- and moderate-income US consumers. What does it really mean to lack $400 when you need it? How do people cope? What are the aspirations and anxieties that fuel choices? Which strategies work and which fall flat? Our work exists to focus the dialogue about financial insecurity by keeping an ear to the ground and amplifying what we hear. Our ultimate goal: Inspire new solutions that react to reality, ones that can genuinely improve the financial well-being of many.

Our consumer insights initiative sees power in partnerships and collaboration. We’re building a big tent for a range of actors to query and share what their data says: private sector companies, public programs, and others who see unique angles into the financial lives of low- and moderate-income households. We are creating a new forum to lift up these firms serving consumers – and in doing so, we’re raising the voices of consumers themselves.

One example of this work is our Consumer Insights Collaborative (CIC), a group of nine leading non-profits from across the country. Each has a strong sense of challenges and opportunities on the ground because every day their work brings them face-to-face with a wide array of consumers, many of whom are low- and moderate-income families. And most already work independently to learn from their data. Take EARN and its Big Data on Small Savings project; the Financial Clinic’s insights series called Change Matters; Mission Asset Fund’s R&D Lab focused on human-centered design; and FII which uses data collection as part of its main service.

Through the CIC, they join forces to see more than any one nonprofit can on their own. Together CIC members articulate common questions and synthesize collective answers. In the coming months we will publish a first-of-its-kind report on a jointly posed question: What are the dimensions and drivers of short term financial stability?

An added bonus of partnerships like the CIC is the community of practice that naturally emerges. We believe that data scientists from all walks can, and indeed must, learn from each other to have the greatest impact. Our initiative especially encourages cooperative capacity-building around data security and privacy. We acknowledge that as access to information grows, so does the risk to consumers themselves. We endorse collaborative projects that value ethics, respect, and integrity as much as they value cross-organizational learning.

As our portfolio grows, we will invite an even broader network to engage. We’re already working with NEST Insights to draw on NEST’s extensive administrative data on retirement savings, with an aim to understand more about the long-term implications of non-traditional work and unstable household balance sheets on financial security….(More)”.

Driven to safety — it’s time to pool our data

Kevin Guo at TechCrunch: “…Anyone with experience in the artificial intelligence space will tell you that quality and quantity of training data is one of the most important inputs in building real-world-functional AI. This is why today’s large technology companies continue to collect and keep detailed consumer data, despite recent public backlash. From search engines, to social media, to self driving cars, data — in some cases even more than the underlying technology itself — is what drives value in today’s technology companies.

It should be no surprise then that autonomous vehicle companies do not publicly share data, even in instances of deadly crashes. When it comes to autonomous vehicles, the public interest (making safe self-driving cars available as soon as possible) is clearly at odds with corporate interests (making as much money as possible on the technology).

We need to create industry and regulatory environments in which autonomous vehicle companies compete based upon the quality of their technology — not just upon their ability to spend hundreds of millions of dollars to collect and silo as much data as possible (yes, this is how much gathering this data costs). In today’s environment the inverse is true: autonomous car manufacturers are focusing on are gathering as many miles of data as possible, with the intention of feeding more information into their models than their competitors, all the while avoiding working together….

The complexity of this data is diverse, yet public — I am not suggesting that people hand over private, privileged data, but actively pool and combine what the cars are seeing. There’s a reason that many of the autonomous car companies are driving millions of virtual miles — they’re attempting to get as much active driving data as they can. Beyond the fact that they drove those miles, what truly makes that data something that they have to hoard? By sharing these miles, by seeing as much of the world in as much detail as possible, these companies can focus on making smarter, better autonomous vehicles and bring them to market faster.

If you’re reading this and thinking it’s deeply unfair, I encourage you to once again consider 40,000 people are preventably dying every year in America alone. If you are not compelled by the massive life-saving potential of the technology, consider that publicly licenseable self-driving data sets would accelerate innovation by removing a substantial portion of the capital barrier-to-entry in the space and increasing competition….(More)”