Paper by Katherine Reilly and Marieliv Flores: The pilot data literacy project Son Mis Datos showed volunteers how to leverage Peru’s national data protection law to request access to personal data held by Peruvian companies, and then it showed them how to audit corporate data use based on the results. While this intervention had a positive impact on data literacy, by basing it on a universalist conception of datafication, our work inadvertently reproduced the dominant data paradigm we hoped to challenge. This paper offers a retrospective analysis of Son Mis Datos, and explores the gap between van Dijck’s widely cited theory of datafication, and the reality of our participants’ experiences with datafication and digital transformation on the ground in Peru. On this basis, we suggest an alternative definition of datafication more appropriate to critical scholarship as the transformation of social relations around the uptake of personal data in the coordination of transactions, and propose an alternative approach to data literacy interventions that begins with the experiences of data subjects…(More)”.
Digital (In)justice in the Smart City
Book edited by Debra Mackinnon, Ryan Burns and Victoria Fast: “In the contemporary moment, smart cities have become the dominant paradigm for urban planning and administration, which involves weaving the urban fabric with digital technologies. Recently, however, the promises of smart cities have been gradually supplanted by recognition of their inherent inequalities, and scholars are increasingly working to envision alternative smart cities.
Informed by these pressing challenges, Digital (In)Justice in the Smart City foregrounds discussions of how we should think of and work towards urban digital justice in the smart city. It provides a deep exploration of the sources of injustice that percolate throughout a range of sociotechnical assemblages, and it questions whether working towards more just, sustainable, liveable, and egalitarian cities requires that we look beyond the limitations of “smartness” altogether. The book grapples with how geographies impact smart city visions and roll-outs, on the one hand, and how (unjust) geographies are produced in smart pursuits, on the other. Ultimately, Digital (In)Justice in the Smart City envisions alternative cities – smart or merely digital – and outlines the sorts of roles that the commons, utopia, and the law might take on in our conceptions and realizations of better cities…(More)”.
How Data Happened: A History from the Age of Reason to the Age of Algorithms
Book by Chris Wiggins and Matthew L Jones: “From facial recognition—capable of checking people into flights or identifying undocumented residents—to automated decision systems that inform who gets loans and who receives bail, each of us moves through a world determined by data-empowered algorithms. But these technologies didn’t just appear: they are part of a history that goes back centuries, from the census enshrined in the US Constitution to the birth of eugenics in Victorian Britain to the development of Google search.
Expanding on the popular course they created at Columbia University, Chris Wiggins and Matthew L. Jones illuminate the ways in which data has long been used as a tool and a weapon in arguing for what is true, as well as a means of rearranging or defending power. They explore how data was created and curated, as well as how new mathematical and computational techniques developed to contend with that data serve to shape people, ideas, society, military operations, and economies. Although technology and mathematics are at its heart, the story of data ultimately concerns an unstable game among states, corporations, and people. How were new technical and scientific capabilities developed; who supported, advanced, or funded these capabilities or transitions; and how did they change who could do what, from what, and to whom?
Wiggins and Jones focus on these questions as they trace data’s historical arc, and look to the future. By understanding the trajectory of data—where it has been and where it might yet go—Wiggins and Jones argue that we can understand how to bend it to ends that we collectively choose, with intentionality and purpose…(More)”.
The Sensitive Politics Of Information For Digital States
Essay by Federica Carugati, Cyanne E. Loyle and Jessica Steinberg: “In 2020, Vice revealed that the U.S. military had signed a contract with Babel Street, a Virginia-based company that created a product called Locate X, which collects location data from users across a variety of digital applications. Some of these apps are seemingly innocuous: one for following storms, a Muslim dating app and a level for DIY home repair. Less innocuously, these reports indicate that the U.S. government is outsourcing some of its counterterrorism and counterinsurgency information-gathering activities to a private company.
While states have always collected information about citizens and their activities, advances in digital technologies — including new kinds of data and infrastructure — have fundamentally altered their ability to access, gather and analyze information. Bargaining with and relying on non-state actors like private companies creates tradeoffs between a state’s effectiveness and legitimacy. Those tradeoffs might be unacceptable to citizens, undermining our very understanding of what states do and how we should interact with them …(More)”
LocalView, a database of public meetings for the study of local politics and policy-making in the United State
Paper by Soubhik Barari and Tyler Simko: “Despite the fundamental importance of American local governments for service provision in areas like education and public health, local policy-making remains difficult and expensive to study at scale due to a lack of centralized data. This article introduces LocalView, the largest existing dataset of real-time local government public meetings–the central policy-making process in local government. In sum, the dataset currently covers 139,616 videos and their corresponding textual and audio transcripts of local government meetings publicly uploaded to YouTube–the world’s largest public video-sharing website– from 1,012 places and 2,861 distinct governments across the United States between 2006–2022. The data are processed, downloaded, cleaned, and publicly disseminated (at localview.net) for analysis across places and over time. We validate this dataset using a variety of methods and demonstrate how it can be used to map local governments’ attention to policy areas of interest. Finally, we discuss how LocalView may be used by journalists, academics, and other users for understanding how local communities deliberate crucial policy questions on topics including climate change, public health, and immigration…(More)”.
When Ideology Drives Social Science
Article by Michael Jindra and Arthur Sakamoto: Last summer in these pages, Mordechai Levy-Eichel and Daniel Scheinerman uncovered a major flaw in Richard Jean So’s Redlining Culture: A Data History of Racial Inequality and Postwar Fiction, one that rendered the book’s conclusion null and void. Unfortunately, what they found was not an isolated incident. In complex areas like the study of racial inequality, a fundamentalism has taken hold that discourages sound methodology and the use of reliable evidence about the roots of social problems.
We are not talking about mere differences in interpretation of results, which are common. We are talking about mistakes so clear that they should cause research to be seriously questioned or even disregarded. A great deal of research — we will focus on examinations of Asian American class mobility — rigs its statistical methods in order to arrive at ideologically preferred conclusions.
Most sophisticated quantitative work in sociology involves multivariate research, often in a search for causes of social problems. This work might ask how a particular independent variable (e.g., education level) “causes” an outcome or dependent variable (e.g., income). Or it could study the reverse: How does parental income influence children’s education?
Human behavior is too complicated to be explained by only one variable, so social scientists typically try to “control” for various causes simultaneously. If you are trying to test for a particular cause, you want to isolate that cause and hold all other possible causes constant. One can control for a given variable using what is called multiple regression, a statistical tool that parcels out the separate net effects of several variables simultaneously.
If you want to determine whether income causes better education outcomes, you’d want to compare everyone from a two-parent family, since family status might be another causal factor, for instance. You’d also want to see the effect of family status by comparing everyone with similar incomes. And so on for other variables.
The problem is that there are potentially so many variables that a researcher inevitably leaves some out…(More)”.
Toward a 21st Century National Data Infrastructure: Mobilizing Information for the Common Good
Report by National Academies of Sciences, Engineering, and Medicine: “Historically, the U.S. national data infrastructure has relied on the operations of the federal statistical system and the data assets that it holds. Throughout the 20th century, federal statistical agencies aggregated survey responses of households and businesses to produce information about the nation and diverse subpopulations. The statistics created from such surveys provide most of what people know about the well-being of society, including health, education, employment, safety, housing, and food security. The surveys also contribute to an infrastructure for empirical social- and economic-sciences research. Research using survey-response data, with strict privacy protections, led to important discoveries about the causes and consequences of important societal challenges and also informed policymakers. Like other infrastructure, people can easily take these essential statistics for granted. Only when they are threatened do people recognize the need to protect them…(More)”.
Americans Can’t Consent to Companies Use of their Data
A Report from the Annenberg School for Communication: “Consent has always been a central part of Americans’ interactions with the commercial internet. Federal and state laws, as well as decisions from the Federal Trade Commission (FTC), require either implicit (“opt out”) or explicit (“opt in”) permission from individuals for companies to take and use data about them. Genuine opt out and opt in consent requires that people have knowledge about commercial data-extraction practices as well as a belief they can do something about them. As we approach the 30th anniversary of the commercial internet, the latest Annenberg national survey finds that Americans have neither. High percentages of Americans don’t know, admit they don’t know, and believe they can’t do anything about basic practices and policies around companies’ use of people’s data…
High levels of frustration, concern, and fear compound Americans’ confusion: 80% say they have little control over how marketers can learn about them online; 80% agree that what companies know about them from their online behaviors can harm them. These and related discoveries from our survey paint a picture of an unschooled and admittedly incapable society that rejects the internet industry’s insistence that people will accept tradeoffs for benefits and despairs of its inability to predictably control its digital life in the face of powerful corporate forces. At a time when individual consent lies at the core of key legal frameworks governing the collection and use of personal information, our findings describe an environment where genuine consent may not be possible….The aim of this report is to chart the particulars of Americans’ lack of knowledge about the commercial use of their data and their “dark resignation” in connection to it. Our goal is also to raise questions and suggest solutions about public policies that allow companies to gather, analyze, trade, and otherwise benefit from information they extract from large populations of people who are uninformed about how that information will be used and deeply concerned about the consequences of its use. In short, we find that informed consent at scale is a myth, and we urge policymakers to act with that in mind.”…(More)”.
AI-Ready Open Data
Explainer by Sean Long and Tom Romanoff: “Artificial intelligence and machine learning (AI/ML) have the potential to create applications that tackle societal challenges from human health to climate change. These applications, however, require data to power AI model development and implementation. Government’s vast amount of open data can fill this gap: McKinsey estimates that open data can help unlock $3 trillion to $5 trillion in economic value annually across seven sectors. But for open data to fuel innovations in academia and the private sector, the data must be both easy to find and use. While Data.gov makes it simpler to find the federal government’s open data, researchers still spend up to 80% of their time preparing data into a usable, AI-ready format. As Intel warns, “You’re not AI-ready until your data is.”
In this explainer, the Bipartisan Policy Center provides an overview of existing efforts across the federal government to improve the AI readiness of its open data. We answer the following questions:
- What is AI-ready data?
- Why is AI-ready data important to the federal government’s AI agenda?
- Where is AI-ready data being applied across federal agencies?
- How could AI-ready data become the federal standard?…(More)”.
Privacy Decisions are not Private: How the Notice and Choice Regime Induces us to Ignore Collective Privacy Risks and what Regulation should do about it
Paper by Christopher Jon Sprigman and Stephan Tontrup: “For many reasons the current notice and choice privacy framework fails to empower individuals in effectively making their own privacy choices. In this Article we offer evidence from three novel experiments showing that at the core of this failure is a cognitive error. Notice and choice caters to a heuristic that people employ to make privacy decisions. This heuristic is meant to judge trustworthiness in face-to-face-situations. In the online context, it distorts privacy decision-making and leaves potential disclosers vulnerable to exploitation.
From our experimental evidence exploring the heuristic’s effect, we conclude that privacy law must become more behaviorally aware. Specifically, privacy law must be redesigned to intervene in the cognitive mechanisms that keep individuals from making better privacy decisions. A behaviorally-aware privacy regime must centralize, standardize and simplify the framework for making privacy choices.
To achieve these goals, we propose a master privacy template which requires consumers to define their privacy preferences in advance—doing so avoids presenting the consumer with a concrete counterparty, and this, in turn, prevents them from applying the trust heuristic and reduces many other biases that affect privacy decision-making. Our data show that blocking the heuristic enables consumers to consider relevant privacy cues and be considerate of externalities their privacy decisions cause.
The master privacy template provides a much more effective platform for regulation. Through the master template the regulator can set the standard for automated communication between user clients and website interfaces, a facility which we expect to enhance enforcement and competition about privacy terms…(More)”.