Los Angeles Accuses Weather Channel App of Covertly Mining User Data


Jennifer Valentino-DeVries and Natasha Singer in The New York Times: “The Weather Channel app deceptively collected, shared and profited from the location information of millions of American consumers, the city attorney of Los Angeles said in a lawsuit filed on Thursday.

One of the most popular online weather services in the United States, the Weather Channel app has been downloaded more than 100 million times and has 45 million active users monthly.

The government said the Weather Company, the business behind the app, unfairly manipulated users into turning on location tracking by implying that the information would be used only to localize weather reports. Yet the company, which is owned by IBM, also used the data for unrelated commercial purposes, like targeted marketing and analysis for hedge fundsaccording to the lawsuit

In the complaint, the city attorney excoriated the Weather Company, saying it unfairly took advantage of its app’s popularity and the fact that consumers were likely to give their location data to get local weather alerts. The city said that the company failed to sufficiently disclose its data practices when it got users’ permission to track their location and that it obscured other tracking details in its privacy policy.

“These issues certainly aren’t limited to our state,” Mr. Feuer said. “Ideally this litigation will be the catalyst for other action — either litigation or legislative activity — to protect consumers’ ability to assure their private information remains just that, unless they speak clearly in advance.”…(More)”.

Can a set of equations keep U.S. census data private?


Jeffrey Mervis at Science: “The U.S. Census Bureau is making waves among social scientists with what it calls a “sea change” in how it plans to safeguard the confidentiality of data it releases from the decennial census.

The agency announced in September 2018 that it will apply a mathematical concept called differential privacy to its release of 2020 census data after conducting experiments that suggest current approaches can’t assure confidentiality. But critics of the new policy believe the Census Bureau is moving too quickly to fix a system that isn’t broken. They also fear the changes will degrade the quality of the information used by thousands of researchers, businesses, and government agencies.

The move has implications that extend far beyond the research community. Proponents of differential privacy say a fierce, ongoing legal battle over plans to add a citizenship question to the 2020 census has only underscored the need to assure people that the government will protect their privacy....

Differential privacy, first described in 2006, isn’t a substitute for swapping and other ways to perturb the data. Rather, it allows someone—in this case, the Census Bureau—to measure the likelihood that enough information will “leak” from a public data set to open the door to reconstruction.

“Any time you release a statistic, you’re leaking something,” explains Jerry Reiter, a professor of statistics at Duke University in Durham, North Carolina, who has worked on differential privacy as a consultant with the Census Bureau. “The only way to absolutely ensure confidentiality is to release no data. So the question is, how much risk is OK? Differential privacy allows you to put a boundary” on that risk....

In the case of census data, however, the agency has already decided what information it will release, and the number of queries is unlimited. So its challenge is to calculate how much the data must be perturbed to prevent reconstruction....

A professor of labor economics at Cornell University, Abowd first learned that traditional procedures to limit disclosure were vulnerable—and that algorithms existed to quantify the risk—at a 2005 conference on privacy attended mainly by cryptographers and computer scientists. “We were speaking different languages, and there was no Rosetta Stone,” he says.

He took on the challenge of finding common ground. In 2008, building on a long relationship with the Census Bureau, he and a team at Cornell created the first application of differential privacy to a census product. It is a web-based tool, called OnTheMap, that shows where people work and live….

The three-step process required substantial computing power. First, the researchers reconstructed records for individuals—say, a 55-year-old Hispanic woman—by mining the aggregated census tables. Then, they tried to match the reconstructed individuals to even more detailed census block records (that still lacked names or addresses); they found “putative matches” about half the time.

Finally, they compared the putative matches to commercially available credit databases in hopes of attaching a name to a particular record. Even if they could, however, the team didn’t know whether they had actually found the right person.

Abowd won’t say what proportion of the putative matches appeared to be correct. (He says a forthcoming paper will contain the ratio, which he calls “the amount of uncertainty an attacker would have once they claim to have reidentified a person from the public data.”) Although one of Abowd’s recent papers notes that “the risk of re-identification is small,” he believes the experiment proved reidentification “can be done.” And that, he says, “is a strong motivation for moving to differential privacy.”…

Such arguments haven’t convinced Ruggles and other social scientists opposed to applying differential privacy on the 2020 census. They are circulating manuscripts that question the significance of the census reconstruction exercise and that call on the agency to delay and change its plan....

Ruggles, meanwhile, has spent a lot of time thinking about the kinds of problems differential privacy might create. His Minnesota institute, for instance, disseminates data from the Census Bureau and 105 other national statistical agencies to 176,000 users. And he fears differential privacy will put a serious crimp in that flow of information…

There are also questions of capacity and accessibility. The centers require users to do all their work onsite, so researchers would have to travel, and the centers offer fewer than 300 workstations in total....

Abowd has said, “The deployment of differential privacy within the Census Bureau marks a sea change for the way that official statistics are produced and published.” And Ruggles agrees. But he says the agency hasn’t done enough to equip researchers with the maps and tools needed to navigate the uncharted waters….(More)”.

Data Policy in the Fourth Industrial Revolution: Insights on personal data


Report by the World Economic Forum: “Development of comprehensive data policy necessarily involves trade-offs. Cross-border data flows are crucial to the digital economy. The use of data is critical to innovation and technology. However, to engender trust, we need to have appropriate levels of protection in place to ensure privacy, security and safety. Over 120 laws in effect across the globe today provide differing levels of protection for data but few anticipated 

Data Policy in the Fourth Industrial Revolution: Insights on personal data, a paper by the World Economic Forum in collaboration with the Ministry of Cabinet Affairs and the Future, United Arab Emirates, examines the relationship between risk and benefit, recognizing the impact of culture, values and social norms This work is a start toward developing a comprehensive data policy toolkit and knowledge repository of case studies for policy makers and data policy leaders globally….(More)”.

The UN Principles on Personal Data Protection and Privacy


United Nations System: “The Principles on Personal Data Protection and Privacy set out a basic framework for the processing of personal data by, or on behalf of, the United Nations System Organizations in carrying out their mandated activities.

The Principles aim to: (i) harmonize standards for the protection of personal data across the UN System; (ii) facilitate the accountable processing of personal data; and (iii) ensure respect for the human rights and fundamental freedoms of individuals, in particular the right to privacy. These Principles apply to personal data, contained in any form, and processed in any manner. Where appropriate, they may also be used as a benchmark for the processing of non-personal data, in a sensitive context that may put certain individuals or groups of individuals at risk of harms. 
 
The High Level Committee on Management (HLCM) formally adopted the Principles at its 36th Meeting on 11 October 2018. The adoption followed the HLCM’s decision at its 35th Meeting in April 2018 to engage with the UN Data Privacy Policy Group (UN PPG) in developing a set of high-level principles on the cross-cutting issue of data privacy. Preceding the 36th HLCM meeting in October, the Principles were developed and unanimously endorsed by the organizations represented on the UN PPG….(More) (Download the Personal Data Protection and Privacy Principles)

In High-Tech Cities, No More Potholes, but What About Privacy?


Timothy Williams in The New York Times: “Hundreds of cities, large and small, have adopted or begun planning smart cities projects. But the risks are daunting. Experts say cities frequently lack the expertise to understand privacy, security and financial implications of such arrangements. Some mayors acknowledge that they have yet to master the responsibilities that go along with collecting billions of bits of data from residents….

Supporters of “smart cities” say that the potential is enormous and that some projects could go beyond creating efficiencies and actually save lives. Among the plans under development are augmented reality programs that could help firefighters find people trapped in burning buildings and the collection of sewer samples by robots to determine opioid use so that city services could be aimed at neighborhoods most in need.

The hazards are also clear.

“Cities don’t know enough about data, privacy or security,” said Lee Tien, a lawyer at the Electronic Frontier Foundation, a nonprofit organization focused on digital rights. “Local governments bear the brunt of so many duties — and in a lot of these cases, they are often too stupid or too lazy to talk to people who know.”

Cities habitually feel compelled to outdo each other, but the competition has now been intensified by lobbying from tech companies and federal inducements to modernize.

“There is incredible pressure on an unenlightened city to be a ‘smart city,’” said Ben Levine, executive director at MetroLab Network, a nonprofit organization that helps cities adapt to technology change.

That has left Washington, D.C., and dozens of other cities testing self-driving cars and Orlando trying to harness its sunshine to power electric vehicles. San Francisco has a system that tracks bicycle traffic, while Palm Beach, Fla., uses cycling data to decide where to send street sweepers. Boise, Idaho, monitors its trash dumps with drones. Arlington, Tex., is looking at creating a transit system based on data from ride-sharing apps….(More)”.

Beyond the IRB: Towards a typology of research ethics in applied economics


Paper by Michler, Jeffrey D., Masters, William A. and Josephson, Anna: “Conversations about ethics often appeal to those responsible for the ethical behavior, encouraging adoption of “better,” more ethical conduct. In this paper, we consider an alternative frame: a typology of ethical misconduct, focusing on who are the victims of various types of unethical behavior. The typology is constructed around 1) who may be harmed and 2) by what mechanism an individual or party is harmed. Building a typology helps to identify times in the life cycle of a research idea where differences exist between who is potentially harmed and who the existing ethical norms protect.

We discuss ethical practices including IRB approvals, which focuses almost entirely on risks to subjects; pre-analysis plans and conflict of interest disclosures, which encourage transparency so as to not mislead editors, reviewers, and readers; and self-plagiarism, which has become increasing common as authors slice their research ever more thinly, causing congestion in journals at the expense of others….(More)”.

The Datafication of Employment


Report by Sam Adler-Bell and Michelle Miller at the Century Foundation: “We live in a surveillance society. Our every preference, inquiry, whim, desire, relationship, and fear can be seen, recorded, and monetized by thousands of prying corporate eyes. Researchers and policymakers are only just beginning to map the contours of this new economy—and reckon with its implications for equity, democracy, freedom, power, and autonomy.

For consumers, the digital age presents a devil’s bargain: in exchange for basically unfettered access to our personal data, massive corporations like Amazon, Google, and Facebook give us unprecedented connectivity, convenience, personalization, and innovation. Scholars have exposed the dangers and illusions of this bargain: the corrosion of personal liberty, the accumulation of monopoly power, the threat of digital redlining,1 predatory ad-targeting,2 and the reification of class and racial stratification.3 But less well understood is the way data—its collection, aggregation, and use—is changing the balance of power in the workplace.

This report offers some preliminary research and observations on what we call the “datafication of employment.” Our thesis is that data-mining techniques innovated in the consumer realm have moved into the workplace. Firms who’ve made a fortune selling and speculating on data acquired from consumers in the digital economy are now increasingly doing the same with data generated by workers. Not only does this corporate surveillance enable a pernicious form of rent-seeking—in which companies generate huge profits by packaging and selling worker data in marketplace hidden from workers’ eyes—but also, it opens the door to an extreme informational asymmetry in the workplace that threatens to give employers nearly total control over every aspect of employment.

The report begins with an explanation of how a regime of ubiquitous consumer surveillance came about, and how it morphed into worker surveillance and the datafication of employment. The report then offers principles for action for policymakers and advocates seeking to respond to the harmful effects of this new surveillance economy. The final sections concludes with a look forward at where the surveillance economy is going, and how researchers, labor organizers, and privacy advocates should prepare for this changing landscape….(More)”

It’s time for a Bill of Data Rights


Article by Martin Tisne: “…The proliferation of data in recent decades has led some reformers to a rallying cry: “You own your data!” Eric Posner of the University of Chicago, Eric Weyl of Microsoft Research, and virtual-reality guru Jaron Lanier, among others, argue that data should be treated as a possession. Mark Zuckerberg, the founder and head of Facebook, says so as well. Facebook now says that you “own all of the contact and information you post on Facebook” and “can control how it is shared.” The Financial Times argues that “a key part of the answer lies in giving consumers ownership of their own personal data.” In a recent speech, Tim Cook, Apple’s CEO, agreed, saying, “Companies should recognize that data belongs to users.”

This essay argues that “data ownership” is a flawed, counterproductive way of thinking about data. It not only does not fix existing problems; it creates new ones. Instead, we need a framework that gives people rights to stipulate how their data is used without requiring them to take ownership of it themselves….

The notion of “ownership” is appealing because it suggests giving you power and control over your data. But owning and “renting” out data is a bad analogy. Control over how particular bits of data are used is only one problem among many. The real questions are questions about how data shapes society and individuals. Rachel’s story will show us why data rights are important and how they might work to protect not just Rachel as an individual, but society as a whole.

Tomorrow never knows

To see why data ownership is a flawed concept, first think about this article you’re reading. The very act of opening it on an electronic device created data—an entry in your browser’s history, cookies the website sent to your browser, an entry in the website’s server log to record a visit from your IP address. It’s virtually impossible to do anything online—reading, shopping, or even just going somewhere with an internet-connected phone in your pocket—without leaving a “digital shadow” behind. These shadows cannot be owned—the way you own, say, a bicycle—any more than can the ephemeral patches of shade that follow you around on sunny days.

Your data on its own is not very useful to a marketer or an insurer. Analyzed in conjunction with similar data from thousands of other people, however, it feeds algorithms and bucketizes you (e.g., “heavy smoker with a drink habit” or “healthy runner, always on time”). If an algorithm is unfair—if, for example, it wrongly classifies you as a health risk because it was trained on a skewed data set or simply because you’re an outlier—then letting you “own” your data won’t make it fair. The only way to avoid being affected by the algorithm would be to never, ever give anyone access to your data. But even if you tried to hoard data that pertains to you, corporations and governments with access to large amounts of data about other people could use that data to make inferences about you. Data is not a neutral impression of reality. The creation and consumption of data reflects how power is distributed in society. …(More)”.

The Everyday Life of an Algorithm


Book by Daniel Neyland: “This open access book begins with an algorithm–a set of IF…THEN rules used in the development of a new, ethical, video surveillance architecture for transport hubs. Readers are invited to follow the algorithm over three years, charting its everyday life. Questions of ethics, transparency, accountability and market value must be grasped by the algorithm in a series of ever more demanding forms of experimentation. Here the algorithm must prove its ability to get a grip on everyday life if it is to become an ordinary feature of the settings where it is being put to work. Through investigating the everyday life of the algorithm, the book opens a conversation with existing social science research that tends to focus on the power and opacity of algorithms. In this book we have unique access to the algorithm’s design, development and testing, but can also bear witness to its fragility and dependency on others….(More)”.

On the privacy-conscientious use of mobile phone data


Yves-Alexandre de Montjoye et al in Nature: “The breadcrumbs we leave behind when using our mobile phones—who somebody calls, for how long, and from where—contain unprecedented insights about us and our societies. Researchers have compared the recent availability of large-scale behavioral datasets, such as the ones generated by mobile phones, to the invention of the microscope, giving rise to the new field of computational social science.

With mobile phone penetration rates reaching 90% and under-resourced national statistical agencies, the data generated by our phones—traditional Call Detail Records (CDR) but also high-frequency x-Detail Record (xDR)—have the potential to become a primary data source to tackle crucial humanitarian questions in low- and middle-income countries. For instance, they have already been used to monitor population displacement after disasters, to provide real-time traffic information, and to improve our understanding of the dynamics of infectious diseases. These data are also used by governmental and industry practitioners in high-income countries.

While there is little doubt on the potential of mobile phone data for good, these data contain intimate details of our lives: rich information about our whereabouts, social life, preferences, and potentially even finances. A BCG study showed, e.g., that 60% of Americans consider location data and phone number history—both available in mobile phone data—as “private”.

Historically and legally, the balance between the societal value of statistical data (in aggregate) and the protection of privacy of individuals has been achieved through data anonymization. While hundreds of different anonymization algorithms exist, most of them are variations and improvements of the seminal k-anonymity algorithm introduced in 1998. Recent studies have, however, shown that pseudonymization and standard de-identification are not sufficient to prevent users from being re-identified in mobile phone data. Four data points—approximate places and times where an individual was present—have been shown to be enough to uniquely re-identify them 95% of the time in a mobile phone dataset of 1.5 million people. Furthermore, re-identification estimations using unicity—a metric to evaluate the risk of re-identification in large-scale datasets—and attempts at k-anonymizing mobile phone data ruled out de-identification as sufficient to truly anonymize the data. This was echoed in the recent report of the [US] President’s Council of Advisors on Science and Technology on Big Data Privacy which consider de-identification to be useful as an “added safeguard, but [emphasized that] it is not robust against near-term future re-identification methods”.

The limits of the historical de-identification framework to adequately balance risks and benefits in the use of mobile phone data are a major hindrance to their use by researchers, development practitioners, humanitarian workers, and companies. This became particularly clear at the height of the Ebola crisis, when qualified researchers (including some of us) were prevented from accessing relevant mobile phone data on time despite efforts by mobile phone operators, the GSMA, and UN agencies, with privacy being cited as one of the main concerns.

These privacy concerns are, in our opinion, due to the failures of the traditional de-identification model and the lack of a modern and agreed upon framework for the privacy-conscientious use of mobile phone data by third-parties especially in the context of the EU General Data Protection Regulation (GDPR). Such frameworks have been developed for the anonymous use of other sensitive data such as census, household survey, and tax data. The positive societal impact of making these data accessible and the technical means available to protect people’s identity have been considered and a trade-off, albeit far from perfect, has been agreed on and implemented. This has allowed the data to be used in aggregate for the benefit of society. Such thinking and an agreed upon set of models has been missing so far for mobile phone data. This has left data protection authorities, mobile phone operators, and data users with little guidance on technically sound yet reasonable models for the privacy-conscientious use of mobile phone data. This has often resulted in suboptimal tradeoffs if any.

In this paper, we propose four models for the privacy-conscientious use of mobile phone data (Fig. 1). All of these models 1) focus on a use of mobile phone data in which only statistical, aggregate information is ultimately needed by a third-party and, while this needs to be confirmed on a per-country basis, 2) are designed to fall under the legal umbrella of “anonymous use of the data”. Examples of cases in which only statistical aggregated information is ultimately needed by the third-party are discussed below. They would include, e.g., disaster management, mobility analysis, or the training of AI algorithms in which only aggregate information on people’s mobility is ultimately needed by agencies, and exclude cases in which individual-level identifiable information is needed such as targeted advertising or loans based on behavioral data.

Figure 1
Figure 1: Matrix of the four models for the privacy-conscientious use of mobile phone data.

First, it is important to insist that none of these models is a silver bullet…(More)”.