How to use mobile phone data for good without invading anyone’s privacy


Leo Mirani in Quartz: “In 2014, when the West African Ebola outbreak was at its peak, some academics argued that the epidemic could have been slowed by using mobile phone data.

Their premise was simple: call-data records show the true nature of social networks and human movement. Understanding social networks and how people really move—as seen from phone movements and calls—could give health officials the ability to predict how a disease will move and where a disease will strike next, and prepare accordingly.

The problem is that call-data records are very hard to get a hold of. The files themselves are huge, there are enormous privacy risks, and the process of making the records safe for distribution is long.
First, the technical basics

Every time you make a phone call from your mobile phone to another mobile phone, the network records the following information (note: this is not a complete list):

  • The number from which the call originated
  • The number at which the call terminated
  • Start time of the call
  • Duration of the call
  • The ID number of the phone making the call
  • The ID number of the SIM card used to make the call
  • The code for the antenna used to make the call

On their own, these records are not creepy. Indeed, without them, networks would be unable to connect calls or bill customers. But it is easy to see why operators aren’t rushing to share this information. Even though the data includes none of the actual content of a phone call in the data, simply knowing which number is calling which, and from where and when, is usually more than enough to identify people.
So how can network operators use this valuable data for good while also protecting their own interests and those of their customers? A good example can be found in Africa, where Orange, a French mobile phone network with interests across several African countries, has for the second year run its “Data for Development” (D4D) program, which offers researchers a chance to mine call data for clues on development problems.

Steps to safe sharing

After a successful first year in Ivory Coast, Orange this year ran the D4D program in Senegal. The aim of the program is to give researchers and scientists at universities and other research labs access to data in order to find novel ways to aid development in health, agriculture, transport or urban planning, energy, and national statistics….(More)”

Privacy in the Modern Age: The Search for Solutions


New book edited by Marc Rotenberg, Julia Horwitz, and Jeramie Scott: “The threats to privacy are well known: the National Security Agency tracks our phone calls, Google records where we go online and how we set our thermostats, Facebook changes our privacy settings when it wishes, Target gets hacked and loses control of our credit card information, our medical records are available for sale to strangers, our children are fingerprinted and their every test score saved for posterity, and small robots patrol our schoolyards while drones may soon fill our skies.

The contributors to this anthology don’t simply describe these problems or warn about the loss of privacy- they propose solutions. They look closely at business practices, public policy, and technology design and ask, “Should this continue? Is there a better approach?” They take seriously the dictum of Thomas Edison: “What one creates with his hand, he should control with his head.” It’s a new approach to the privacy debate, one that assumes privacy is worth protecting, that there are solutions to be found, and that the future is not yet known. This volume will be an essential reference for policy makers and researchers, journalists and scholars, and others looking for answers to one of the biggest challenges of our modern day. The premise is clear: there’s a problem- let’s find a solution….(More)”

India asks its citizens: please digitise our files


Joshua Chambers in FutureGov: “India has asked its citizens to help digitise records so that it can move away from paper processes.

Using its crowdsourcing web site MyGov, the government wrote that “we cannot talk of Digital India and transforming India into a knowledge society if most of the transactions continue to be physical.”

It is “essential” that paper records are converted into machine readable digital versions, the government added, but “the cost of such digitisation is very large and existing budgetary constraints of government and many other organisations do not allow such lavish digitisation effort.”

Consequently, the government is asking citizens for advice on how to build a cheap content management system and tools that will allow it to crowdsource records transcriptions. Citizens would be rewarded for every word that they transcribe through a points system, which can then be recouped into cash prizes.

“The proposed platform will create earning and income generation opportunities for our literate rural and urban citizens, develop digital literacy and IT skills and include them in the making of Digital India,” the government added.

The announcement also noted the importance of privacy, suggesting that documents are split so that no portion gives any clue regarded the overall content of the document.

Instead, two people will be given the same words to transcribe, and the software will compare their statements to ensure accuracy. Only successful transcription will be rewarded with points….(More)”

Facebook’s Filter Study Raises Questions About Transparency


Will Knight in MIT Technology Review: “Facebook is an enormously valuable source of information about social interactions.

Facebook’s latest scientific research, about the way it shapes the political perspectives users are exposed to, has led some academics to call for the company to be more open about what it chooses to study and publish.

This week the company’s data science team published a paper in the prominent journal Science confirming what many had long suspected: that the network’s algorithms filter out some content that might challenge a person’s political leanings. However, the paper also suggested that the effect was fairly small, and less significant than a user’s own filtering behavior (see “Facebook Says You Filter News More Than Its Algorithm Does”).
Several academics have pointed to limitations of the study, such as the fact that the only people involved had indicated their political affiliation on their Facebook page. Critics point out that those users might behave in a different way from everyone else. But beyond that, a few academics have noted a potential tension between Facebook’s desire to explore the scientific value of its data and its own corporate interests….

In response to the controversy over that study, Facebook’s chief technology officer, Mike Schroepfer, wrote a Facebook post that acknowledged people’s concerns and described new guidelines for its scientific research. “We’ve created a panel including our most senior subject-area researchers, along with people from our engineering, research, legal, privacy and policy teams, that will review projects falling within these guidelines,” he wrote….(More)

Cops Increasingly Use Social Media to Connect, Crowdsource


Sara E. Wilson at GovTech: “Law enforcement has long used public tip lines and missing persons bulletins to recruit citizens in helping solve crime and increasing public safety. Though the need for police departments to connect with their communities is nothing new, the array of technologies available to do so is growing all the time — as are the ways in which departments use those technologies.

In fact, 81 percent of law enforcement professionals use sites such as Facebook and Twitter on the job. And 25 percent use it daily.

Much of law enforcement is crowdsourced — Amber alerts are pushed to smartphones, seeking response from citizens; officers push wanted information and crime tips to users on Facebook and Twitter in the hopes they can help; and some departments create apps to streamline the information sharing.

Take the Johns Creek, Ga., Police Department, which has deployed a tool that allows additional citizen engagement and crowdsourcing…..Using a mobile app — the SunGard Public Sector P2C Converge app, which is branded specifically for Johns Creek PD as JCPD4Me — the department can more smoothly manage public safety announcements and other social media interactions….

Another tool cops use for communicating with citizens is Nixle, which lets agencies publish alerts, advisories, community information and traffic news. Citizens register for free and receive credible, neighborhood-level public safety information via text message and email in real time.

The Oakland, Calif., Police Department (OPD) uses the platform to engage with citizens — an April 17, 2015 post on Oakland PD’s Nixle Community feed informs readers that the department’s Special Victims Section, which is working to put an end to sex trafficking in the city, arrested five individuals for solicitation of prostitution. Since Jan. 1, 2015, OPD has arrested 70 individuals from 27 cities across the state for solicitation of prostitution.

Nixle allows two-way communication as well — the Tip Watch function allows anonymous tipsters to send information to Oakland PD in three ways (text, phone, Web). Now OPD can issue a passcode to tipsters for two-way, anonymous communication to help gather more information.

On the East Coast, the Peabody, Mass., Police Department has used the My Police Department (MyPD) app by WiredBlue, which lets citizens submit tips and feedback directly to the department, since its creation….

During the high-profile manhunt for the Boston Marathon bombers in 2013, the FBI asked the public for eyewitness photo and video evidence. The response from the public was so overwhelming that the server infrastructure couldn’t handle the massive inflow of data.

This large-scale crowdsourcing and data dilemma inspired a new product: the Los Angeles Sheriff’s Department’s Large Emergency Event Digital Information Repository (LEEDIR). Developed by CitizenGlobal Inc. and Amazon Web Services, LEEDIR pairs an app with cloud storage to help police use citizens’ smartphones as tools to gather and investigate evidence. Since its creation, the repository was used in Santa Barbara, Calif., in 2014 to investigate riots in Isla Vista.

Proponents of LEEDIR say the crowdsourcing system gives authorities a secure, central repository for the countless electronic tips that can come in during a crisis. Critics, however, claim that privacy issues come into play with this kind of policing. …(More)”

How Data Mining could have prevented Tunisia’s Terror attack in Bardo Museum


Wassim Zoghlami at Medium: “…Data mining is the process of posing queries and extracting useful patterns or trends often previously unknown from large amounts of data using various techniques such as those from pattern recognition and machine learning. Latelely there has been a big interest on leveraging the use of data mining for counter-terrorism applications

Using the data on more than 50.000+ ISIS connected twitter accounts , I was able to establish an understanding of some factors determined how often ISIS attacks occur , what different types of terror strikes are used in which geopolitical situations, and many other criteria through graphs about the frequency of hashtags usages and the frequency of a particular group of the words used in the tweets.

A simple data mining project of some of the repetitive hashtags and sequences of words used typically by ISIS militants in their tweets yielded surprising results. The results show a rise of some keywords on the tweets that started from Marsh 15, three days before Bardo museum attacks.

Some of the common frequent keywords and hashtags that had a unusual peak since marsh 15 , three days before the attack :

#طواغيت تونس : Tyrants of Tunisia = a reference to the military

بشرى تونس : Good news for Tunisia.

قريبا تونس : Soon in Tunisia.

#إفريقية_للإعلام : The head of social media of Afriqiyah

#غزوة_تونس : The foray of Tunis…

Big Data and Data Mining should be used for national security intelligence

The Tunisian national security has to leverage big data to predict such attacks and to achieve objectives as the volume of digital data. Some of the challenges facing the Data mining techniques are that to carry out effective data mining and extract useful information for counterterrorism and national security, we need to gather all kinds of information about individuals. However, this information could be a threat to the individuals’ privacy and civil liberties…(More)”

Domestic Drones and Privacy: A Primer


Richard M. Thompson for the Congressional Research Service: “There are two overarching privacy issues implicated by domestic drone use. The first is defining what “privacy” means in the context of aerial surveillance. Privacy is an ambiguous term that can mean different things in different contexts. This becomes readily apparent when attempting to apply traditional privacy concepts such as personal control and secrecy to drone surveillance. Other, more nuanced privacy theories such as personal autonomy and anonymity must be explored to get a fuller understanding of the privacy risks posed by drone surveillance. Moreover, with ever-increasing advances in data storage and manipulation, the subsequent aggregation, use, and retention of drone-obtained data may warrant an additional privacy impact analysis.

The second predominant issue is which entity should be responsible for regulating drones and privacy. As the final arbiter of the Constitution, the courts are naturally looked upon to provide at least the floor of privacy protection from UAS surveillance, but as will be discussed in this report, under current law, this protection may be minimal….(More)”

Health Big Data in the Commercial Context


CDT Press Release: “This paper is the third in a series of three, each of which explores health big data in a different context. The first — on health big data in the government context — is available here, and the second — on health big data in the clinical context — is available here.

Consumers are increasingly using mobile phone apps and wearable devices to generate and share data on health and wellness. They are using personal health record tools to access and copy health records and move them to third party platforms. They are sharing health information on social networking sites. They leave digital health footprints when they conduct online searches for health information. The health data created, accessed, and shared by consumers using these and many other tools can range from detailed clinical information, such as downloads from an implantable device and details about medication regimens, to data about weight, caloric intake, and exercise logged with a smart phone app.

These developments offer a wealth of opportunities for health care and personal wellness. However, privacy questions arise due to the volume and sensitivity of health data generated by consumer-focused apps, devices, and platforms, including the potential analytics uses that can be made of such data.

Many of the privacy issues that face traditional health care entities in the big data era also apply to app developers, wearable device manufacturers, and other entities not part of the traditional health care ecosystem. These include questions of data minimization, retention, and secondary use. Notice and consent pose challenges, especially given the limits of presenting notices on mobile device screens, and the fact that consumer devices may be bought and used without consultation with a health care professional. Security is a critical issue as well.

However, the privacy and security provisions of the Heath Insurance Portability and Accountability Act (HIPAA) do not apply to most app developers, device manufacturers or others in the consumer health space. This has benefits to innovation, as innovators would otherwise have to struggle with the complicated HIPAA rules. However, the current vacuum also leaves innovators without clear guidance on how to appropriately and effectively protect consumers’ health data. Given the promise of health apps, consumer devices, and consumer-facing services, and given the sensitivity of the data that they collect and share, it is important to provide such guidance….

As the source of privacy guidelines, we look to the framework provided by the Fair Information Practice Principles (FIPPs) and explore how it could be applied in an age of big data to patient-generated data. The FIPPs have influenced to varying degrees most modern data privacy regimes. While some have questioned the continued validity of the FIPPs in the current era of mass data collection and analysis, we consider here how the flexibility and rigor of the FIPPs provide an organizing framework for responsible data governance, promoting innovation, efficiency, and knowledge production while also protecting privacy. Rather than proposing an entirely new framework for big data, which could be years in the making at best, using the FIPPs would seem the best approach in promoting responsible big data practices. Applying the FIPPs could also help synchronize practices between the traditional health sector and emerging consumer products….(More)”

Big Other: Surveillance Capitalism and the Prospects of an Information Civilization


New paper by Shoshana Zuboff in the Journal of Information Technology: “This article describes an emergent logic of accumulation in the networked sphere, ‘surveillance capitalism,’ and considers its implications for ‘information civilization.’ Google is to surveillance capitalism what General Motors was to managerial capitalism. Therefore the institutionalizing practices and operational assumptions of Google Inc. are the primary lens for this analysis as they are rendered in two recent articles authored by Google Chief Economist Hal Varian. Varian asserts four uses that follow from computer-mediated transactions: ‘data extraction and analysis,’ ‘new contractual forms due to better monitoring,’ ‘personalization and customization,’ and ‘continuous experiments.’ An examination of the nature and consequences of these uses sheds light on the implicit logic of surveillance capitalism and the global architecture of computer mediation upon which it depends. This architecture produces a distributed and largely uncontested new expression of power that I christen: ‘Big Other.’ It is constituted by unexpected and often illegible mechanisms of extraction, commodification, and control that effectively exile persons from their own behavior while producing new markets of behavioral prediction and modification. Surveillance capitalism challenges democratic norms and departs in key ways from the centuries long evolution of market capitalism….(More)”

The big medical data miss: challenges in establishing an open medical resource


Eric J. Topol in Nature: ” I call for an international open medical resource to provide a database for every individual’s genomic, metabolomic, microbiomic, epigenomic and clinical information. This resource is needed in order to facilitate genetic diagnoses and transform medical care.

“We are each, in effect, one-person clinical trials”

Laurie Becklund was a noted journalist who died in February 2015 at age 66 from breast cancer. Soon thereafter, the Los Angeles Times published her op-ed entitled “As I lay dying” (Ref. 1). She lamented, “We are each, in effect, one-person clinical trials. Yet the knowledge generated from those trials will die with us because there is no comprehensive database of metastatic breast cancer patients, their characteristics and what treatments did and didn’t help them”. She went on to assert that, in the era of big data, the lack of such a resource is “criminal”, and she is absolutely right….

Around the same time of this important op-ed, the MIT Technology Review published their issue entitled “10 Breakthrough Technologies 2015” and on the list was the “Internet of DNA” (Ref. 2). While we are often reminded that the world we live in is becoming the “Internet of Things”, I have not seen this terminology applied to DNA before. The article on the “Internet of DNA” decried, “the unfolding calamity in genomics is that a great deal of life-saving information, though already collected, is inaccessible”. It called for a global network of millions of genomes and cited theMatchmaker Exchange as a frontrunner. For this international initiative, a growing number of research and clinical teams have come together to pool and exchange phenotypic and genotypic data for individual patients with rare disorders, in order to share this information and assist in the molecular diagnosis of individuals with rare diseases….

an Internet of DNA — or what I have referred to as a massive, open, online medicine resource (MOOM) — would help to quickly identify the genetic cause of the disorder4 and, in the process of doing so, precious guidance for prevention, if necessary, would become available for such families who are currently left in the lurch as to their risk of suddenly dying.

So why aren’t such MOOMs being assembled? ….

There has also been much discussion related to privacy concerns that patients might be unwilling to participate in a massive medical information resource. However, multiple global consumer surveys have shown that more than 80% of individuals are ready to share their medical data provided that they are anonymized and their privacy maximally assured4. Indeed, just 24 hours into Apple’s ResearchKit initiative, a smartphone-based medical research programme, there were tens of thousand of patients with Parkinson disease, asthma or heart disease who had signed on. Some individuals are even willing to be “open source” — that is, to make their genetic and clinical data fully available with free access online, without any assurance of privacy. This willingness is seen by the participants in the recently launched Open Humans initiative. Along with the Personal Genome Project, Go Viral and American Gut have joined in this initiative. Still, studies suggest that most individuals would only agree to be medical research participants if their identities would not be attainable. Unfortunately, to date, little has been done to protect individual medical privacy, for which there are both promising new data protection technological approaches4 and the need for additional governmental legislation.

This leaves us with perhaps the major obstacle that is holding back the development of MOOMs — researchers. Even with big, team science research projects culling together hundreds of investigators and institutions throughout the world, such as the Global Alliance for Genomics and Health (GA4GH), the data obtained clinically are just as Laurie Becklund asserted in her op-ed — “one-person clinical trials” (Ref. 1). While undertaking the construction of a MOOM is a huge endeavour, there is little motivation for researchers to take on this task, as this currently offers no academic credit and has no funding source. But the transformative potential of MOOMs to improve medical care is extraordinary. Rather than having the knowledge die with each of us, the time has come to take down the walls of academic medical centres and health-care systems around the world, and create a global knowledge medical resource that leverages each individual’s information to help one another…(More)”