We discuss ethical practices including IRB approvals, which focuses almost entirely on risks to subjects; pre-analysis plans and conflict of interest disclosures, which encourage transparency so as to not mislead editors, reviewers, and readers; and self-plagiarism, which has become
The Datafication of Employment
Report by Sam Adler-Bell and Michelle Miller at the Century Foundation: “We live in a surveillance society. Our every preference, inquiry, whim, desire, relationship, and fear can be seen,
For consumers, the digital age presents a devil’s bargain: in exchange for basically unfettered access to our personal data, massive corporations like Amazon, Google, and Facebook give us unprecedented connectivity, convenience, personalization, and innovation. Scholars have exposed the dangers and illusions of this bargain: the corrosion of personal liberty, the accumulation of monopoly power, the threat of digital redlining,1 predatory ad-targeting,2 and the reification of class and racial stratification.3 But less well understood is the way data—its collection, aggregation, and use—is changing the balance of power in the workplace.
This report offers some preliminary research and observations on what we call the “datafication of employment.” Our thesis is that data-mining techniques innovated in the consumer realm have moved into the workplace. Firms who’ve made a fortune selling and speculating on data acquired from consumers in the digital economy are now increasingly doing the same with data generated by workers. Not only does this corporate surveillance enable a pernicious form of rent-seeking—in which companies generate huge profits by packaging and selling worker data in marketplace hidden from workers’ eyes—but also, it opens the door to an extreme informational asymmetry in the workplace that threatens to give employers nearly total control over every aspect of employment.
The report begins with an explanation of how a regime of ubiquitous consumer surveillance came about, and how it morphed into worker surveillance and the
It’s time for a Bill of Data Rights
Article by Martin Tisne: “…The proliferation of data in recent decades has led some reformers to a rallying cry: “You own your data!” Eric Posner of the University of Chicago, Eric Weyl of Microsoft Research, and virtual-reality guru Jaron Lanier, among others, argue that data should be treated as a possession. Mark Zuckerberg, the founder and head of Facebook, says so as well. Facebook now says that you “own all of the contact and information you post on Facebook” and “can control how it is shared.” The Financial Times argues that “a key part of the answer lies in giving consumers ownership of their own personal data.” In a recent speech, Tim Cook, Apple’s CEO, agreed, saying, “Companies should recognize that data belongs to users.”
This essay argues that “data ownership” is a flawed, counterproductive way of thinking about data. It not only does not fix existing problems; it creates new ones. Instead, we need a framework that gives people rights to stipulate how their data is used without requiring them to take ownership of it themselves
The notion of “ownership” is appealing because it suggests giving you power and control over your data. But owning and “renting” out data is a bad analogy. Control over how particular bits of data are used is only one problem among many. The real questions are questions about how data shapes society and individuals. Rachel’s story will show us why data rights are important and how they might work to protect not just Rachel as an individual, but society as a whole.
Tomorrow never knows
To see why data ownership is a flawed concept, first think about this article you’re reading. The very act of opening it on an electronic device created data—an entry in your browser’s history, cookies the website sent to your browser, an entry in the website’s server log to record a visit from your IP address. It’s virtually impossible to do anything online—reading, shopping, or even just going somewhere with an internet-connected phone in your pocket—without leaving a “digital shadow” behind. These shadows cannot be owned—the way you own, say, a bicycle—any more than can the ephemeral patches of shade that follow you around on sunny days.
Your data on its own is not very useful to a marketer or an insurer. Analyzed in conjunction with similar data from thousands of other people, however, it feeds algorithms and
The Everyday Life of an Algorithm
Book by Daniel Neyland: “This open access book begins with an algorithm–a set of IF…THEN rules used in the development of a new, ethical, video surveillance architecture for transport hubs. Readers are invited to follow the algorithm over three years, charting its everyday life. Questions of ethics, transparency, accountability and market value must be grasped by the algorithm in a series of ever more demanding forms of experimentation. Here the algorithm must prove its ability to get a grip on everyday life if it is to become an ordinary feature of the settings where it is being put to work. Through investigating the everyday life of the algorithm, the book opens a conversation with existing social science research that tends to focus on the power and opacity of algorithms. In this
On the privacy-conscientious use of mobile phone data
Yves-Alexandre de Montjoye et al in Nature: “The breadcrumbs we leave behind when using our mobile phones—who somebody calls, for how long, and from where—contain unprecedented insights about us and our societies. Researchers have compared the recent availability of large-scale behavioral datasets, such as the ones generated by mobile phones, to the invention of the microscope, giving rise to the new field of computational social science.
With mobile phone penetration rates reaching 90% and under-resourced national statistical agencies, the data generated by our phones—traditional Call Detail Records (CDR) but also high-frequency x-Detail Record (xDR)—have the potential to become a primary data source to tackle crucial humanitarian questions in low- and middle-income countries. For instance, they have already been used to monitor population displacement after disasters, to provide real-time traffic information, and to improve our understanding of the dynamics of infectious diseases. These data are also used by governmental and industry practitioners in high-income countries.
While there is little doubt on the potential of mobile phone data for good, these data contain intimate details of our lives: rich information about our whereabouts, social life, preferences, and potentially even finances. A BCG study showed, e.g., that 60% of Americans consider location data and phone number history—both available in mobile phone data—as “private”.
Historically and legally, the balance between the societal value of statistical data (in aggregate) and the protection of privacy of individuals has been achieved through data anonymization. While hundreds of different anonymization algorithms exist, most of them are variations and improvements of the seminal k-anonymity algorithm introduced in 1998. Recent studies have, however, shown that pseudonymization and standard de-identification are not sufficient to prevent users from being re-identified in mobile phone data. Four data points—approximate places and times where an individual was present—have been shown to be enough to uniquely re-identify them 95% of the time in a mobile phone dataset of 1.5 million people. Furthermore, re-identification estimations using unicity—a metric to evaluate the risk of re-identification in large-scale datasets—and attempts at k-anonymizing mobile phone data ruled out de-identification as sufficient to truly anonymize the data. This was echoed in the recent report of the [US] President’s Council of Advisors on Science and Technology on Big Data Privacy which consider de-identification to be useful as an “added safeguard, but [emphasized that] it is not robust against near-term future re-identification methods”.
The limits of the historical de-identification framework to adequately balance risks and benefits in the use of mobile phone data are a major hindrance to their use by researchers, development practitioners, humanitarian workers, and companies. This became particularly clear at the height of the Ebola crisis, when qualified researchers (including some of us) were prevented from accessing relevant mobile phone data on time despite efforts by mobile phone operators, the GSMA, and UN agencies, with privacy being cited as one of the main concerns.
These privacy concerns are, in our opinion, due to the failures of the traditional de-identification model and the lack of a modern and agreed upon framework for the privacy-conscientious use of mobile phone data by third-parties especially in the context of the EU General Data Protection Regulation (GDPR). Such frameworks have been developed for the anonymous use of other sensitive data such as census, household survey, and tax data. The positive societal impact of making these data accessible and the technical means available to protect people’s identity have been considered and a trade-off, albeit far from perfect, has been agreed on and implemented. This has allowed the data to be used in aggregate for the benefit of society. Such thinking and an agreed upon set of models has been missing so far for mobile phone data. This has left data protection authorities, mobile phone operators, and data users with little guidance on technically sound yet reasonable models for the privacy-conscientious use of mobile phone data. This has often resulted in suboptimal tradeoffs if any.
In this paper, we propose four models for the privacy-conscientious use of mobile phone data (Fig. 1). All of these models 1) focus on a use of mobile phone data in which only statistical, aggregate information is ultimately needed by a third-party and, while this needs to be confirmed on a per-country basis, 2) are designed to fall under the legal umbrella of “anonymous use of the data”. Examples of cases in which only statistical aggregated information is ultimately needed by the third-party are discussed below. They would include, e.g., disaster management, mobility analysis, or the training of AI algorithms in which only aggregate information on people’s mobility is ultimately needed by agencies, and exclude cases in which individual-level identifiable information is needed such as targeted advertising or loans based on behavioral data.
First, it is important to insist that none of these models is a silver bullet…(More)”.
Towards matching user mobility traces in large-scale datasets
Paper by Daniel Kondor, Behrooz Hashemian, Yves-Alexandre de Montjoye and Carlo Ratti: “The problem of unicity and reidentifiability of records in large-scale databases has been studied in different contexts and approaches, with focus on preserving privacy or matching records from different data sources. With an increasing number of service providers nowadays routinely collecting location traces of their users on unprecedented scales, there is a pronounced interest in the possibility of matching records and datasets based on spatial trajectories. Extending previous work on reidentifiability of spatial data and trajectory matching, we present the first large-scale analysis of user matchability in real mobility datasets on realistic scales, i.e. among two datasets that consist of several million people’s mobility traces, coming from a mobile network operator and transportation smart card usage. We extract the relevant statistical properties which influence the matching process and analyze their impact on the matchability of users. We show that for individuals with typical activity in the transportation system (those making 3-4 trips per day on average), a matching algorithm based on the co-occurrence of their activities is expected to achieve a 16.8% success only after a one-week long observation of their mobility traces, and over 55% after four weeks. We show that the main determinant of matchability is the expected number of co-occurring records in the two datasets. Finally, we discuss different scenarios in terms of data collection frequency and give estimates of matchability over time. We show that with higher frequency data collection becoming more common, we can expect much higher success rates in even shorter intervals….(More)”.
The Seductive Diversion of ‘Solving’ Bias in Artificial Intelligence
Blog by Julia Powles and Helen Nissenbaum: “Serious thinkers in academia and business have swarmed to the A.I. bias problem, eager to tweak and improve the data and algorithms that drive artificial intelligence. They’ve latched onto fairness as the objective, obsessing over competing constructs of the term that can be rendered in measurable, mathematical form. If the hunt for a science of computational fairness was restricted to engineers, it would be one thing. But given our contemporary exaltation and deference to technologists, it has limited the entire imagination of ethics, law, and the media as well.
There are three problems with this focus on A.I. bias. The first is that addressing bias as a computational problem obscures its root causes. Bias is a social problem, and seeking to solve it within the logic of automation is always going to be inadequate.
Second, even apparent success in tackling bias can have perverse consequences. Take the example of a facial recognition system that works poorly on women of color because of the group’s underrepresentation both in the training data and among system designers. Alleviating this problem by seeking to “equalize” representation merely co-opts designers in perfecting vast instruments of surveillance and classification.
When underlying systemic issues remain fundamentally untouched, the bias fighters simply render humans more machine readable, exposing minorities in particular to additional harms.
Third — and most dangerous and urgent of all — is the way in which the seductive controversy of A.I. bias, and the false allure of “solving” it, detracts from bigger, more pressing questions. Bias is real, but it’s also a captivating diversion.
What has been remarkably underappreciated is the key interdependence of the twin stories of A.I. inevitability and A.I. bias. Against the corporate projection of an otherwise sunny horizon of unstoppable A.I. integration, recognizing and acknowledging bias can be seen as a strategic concession — one that subdues the scale of the challenge. Bias, like job losses and safety hazards, becomes part of the grand bargain of innovation.
The reality that bias is primarily a social problem and cannot be fully solved technically becomes a strength, rather than a weakness, for the inevitability narrative. It flips the script. It absorbs and regularizes the classification practices and underlying systems of inequality perpetuated by automation, allowing relative increases in “fairness” to be claimed as victories — even if all that is being done is to slice, dice, and redistribute the makeup of those negatively affected by actuarial decision-making.
In short, the preoccupation with narrow computational puzzles distracts us from the far more important issue of the colossal asymmetry between societal cost and private gain in the rollout of automated systems. It also denies us the possibility of asking: Should we be building these systems at all?…(More)”.
To Reduce Privacy Risks, the Census Plans to Report Less Accurate Data
Mark Hansen at the New York Times: “When the Census Bureau gathered data in 2010, it made two promises. The form would be “quick and easy,” it said. And “your answers are protected by law.”
But mathematical breakthroughs, easy access to more powerful computing, and widespread availability of large and varied public data sets have made the bureau reconsider whether the protection it offers Americans is strong enough. To preserve confidentiality, the bureau’s directors have determined they need to adopt a “formal privacy” approach, one that adds uncertainty to census data before it is published and achieves privacy assurances that are provable mathematically.
The census has always added some uncertainty to its data, but a key innovation of this new framework, known as “differential privacy,” is a numerical value describing how much privacy loss a person will experience. It determines the amount of randomness — “noise” — that needs to be added to a data set before it is released, and sets up a balancing act between accuracy and privacy. Too much noise would mean the data would not be accurate enough to be useful — in redistricting, in enforcing the Voting Rights Act or in conducting academic research. But too little, and someone’s personal data could be revealed.
On Thursday, the bureau will announce the trade-off it has chosen for data publications from the 2018 End-to-End Census Test it conducted in Rhode Island, the only dress rehearsal before the actual census in 2020. The bureau has decided to enforce stronger privacy protections than companies like Apple or Google had when they each first took up differential privacy
In presentation materials for Thursday’s announcement, special attention is paid to lessening any problems with redistricting: the potential complications of using noisy counts of voting-age people to draw district lines. (By contrast, in 2000 and 2010 the swapping mechanism produced exact counts of potential voters down to the block level.)
The Census Bureau has been an early adopter of differential privacy. Still, instituting the framework on such a large scale is not an easy task, and even some of the big technology firms have had difficulties. For example, shortly after Apple’s announcement in 2016 that it would use differential privacy for data collected from its macOS and iOS operating systems, it was revealed that the actual privacy loss of their systems was much higher than advertised.
Some scholars question the bureau’s abandonment of techniques like swapping in favor of differential privacy. Steven Ruggles, Regents Professor of history and population studies at the University of Minnesota, has relied on census data for decades. Through the Integrated Public Use Microdata Series, he and his team have regularized census data dating to 1850, providing consistency between questionnaires as the forms have changed, and enabling researchers to analyze data across years.
“All of the sudden, Title 13 gets equated with differential privacy — it’s not,” he said, adding that if you make a guess about someone’s identity from looking at census data, you are probably wrong. “That has been regarded in the past as protection of privacy. They want to make it so that you can’t even guess.”
“There is a trade-off between usability and risk,” he added. “I am concerned they may go far too far on privileging an absolutist standard of risk.”
In a working paper published Friday, he said that with the number of private services offering personal data, a prospective hacker would have little incentive to turn to public data such as the census “in an attempt to uncover uncertain, imprecise and outdated information about a particular individual.”…(More)”.
Why We Need to Audit Algorithms
James Guszcza, Iyad Rahwan, Will Bible, Manuel Cebrian and Vic Katyal at Harvard Business Review: “Algorithmic decision-making and artificial intelligence (AI) hold enormous potential and are likely to be economic blockbusters, but we worry that the hype has led many people to overlook the serious problems of introducing algorithms into business and society. Indeed, we see many succumbing to what Microsoft’s Kate Crawford calls “data fundamentalism” — the notion that massive datasets are repositories that yield reliable and objective
Ensuring that societal values are reflected in algorithms and AI technologies will require no less creativity, hard work, and innovation than developing the AI technologies themselves. We have a proposal for a good place to start: auditing. Companies have long been required to issue audited financial statements for the benefit of financial markets and other stakeholders. That’s because — like algorithms — companies’ internal operations appear as “black boxes” to those on the outside. This gives managers an informational advantage over the investing public which could be abused by unethical actors. Requiring managers to report periodically on their operations provides a check on that advantage. To bolster the trustworthiness of these reports, independent auditors are hired to provide reasonable assurance that the reports coming from the “black box” are free of material misstatement. Should we not subject societally impactful “black box” algorithms to comparable scrutiny?
Indeed, some forward thinking regulators are beginning to explore this possibility. For example, the EU’s General Data Protection Regulation (GDPR) requires that organizations be able to explain their algorithmic decisions. The city of New York recently assembled a task force to study possible biases in algorithmic decision systems. It is reasonable to anticipate that emerging regulations might be met with market pull for services involving algorithmic accountability.
So what might an algorithm auditing discipline look like? First, it should adopt a holistic perspective. Computer science and machine learning methods will be necessary, but likely not sufficient foundations for an algorithm auditing discipline. Strategic thinking, contextually informed professional judgment, communication, and the scientific method are also required.
As a result, algorithm auditing must be interdisciplinary in order for it to succeed
Beijing to Judge Every Resident Based on Behavior by End of 2020
Bloomberg News: “China’s plan to judge each of its 1.3 billion people based on their social behavior is moving a step closer to reality, with Beijing set to adopt a lifelong points program by 2021 that assigns personalized ratings for each resident.
The capital city will pool data from several departments to reward and punish some 22 million citizens based on their actions and reputations by the end of 2020, according to a plan posted on the Beijing municipal government’s website on Monday. Those with better so-called social credit will get “green channel” benefits while those who violate laws will find life more difficult.
The Beijing project will improve blacklist systems so that those deemed untrustworthy will be “unable to move even a single step,” according to the government’s plan. Xinhua reported on the proposal Tuesday, while the report posted on the municipal government’s website is dated July 18.
China has long experimented with systems that grade its citizens, rewarding good behavior with streamlined services while punishing bad actions with restrictions and penalties. Critics say such moves are fraught with risks and could lead to systems that reduce humans to little more than a report card.
Ambitious Plan
Beijing’s efforts represent the most ambitious yet among more than a dozen cities that are moving ahead with similar programs.
Hangzhou rolled out its personal credit system earlier this year, rewarding “pro-social behaviors” such as volunteer work and blood donations while punishing those who violate traffic laws and charge under-the-table fees. By the end of May, people with bad credit in China have been blocked from booking more than 11 million flights and 4 million high-speed train trips, according to the National Development and Reform Commission.
According to the Beijing government’s plan, different agencies will link databases to get a more detailed picture of every resident’s interactions across a swathe of services