My Voice, Your Voice, Our Voice: Attitudes Towards Collective Governance of a Choral AI Dataset


Paper by Jennifer Ding, Eva Jäger, Victoria Ivanova, and Mercedes Bunz: “Data grows in value when joined and combined; likewise the power of voice grows in ensemble. With 15 UK choirs, we explore opportunities for bottom-up data governance of a jointly created Choral AI Dataset. Guided by a survey of chorister attitudes towards generative AI models trained using their data, we explore opportunities to create empowering governance structures that go beyond opt in and opt out. We test the development of novel mechanisms such as a Trusted Data Intermediary (TDI) to enable governance of the dataset amongst the choirs and AI developers. We hope our findings can contribute to growing efforts to advance collective data governance practices and shape a more creative, empowering future for arts communities in the generative AI ecosystem…(More)”.

Synthetic Data, Synthetic Media, and Surveillance


Paper by Aaron Martin and Bryce Newell: “Public and scholarly interest in the related concepts of synthetic data and synthetic media has exploded in recent years. From issues raised by the generation of synthetic datasets to train machine learning models to the public-facing, consumer availability of artificial intelligence (AI) powered image manipulation and creation apps and the associated increase in synthetic (or “deepfake”) media, these technologies have shifted from being niche curiosities of the computer science community to become topics of significant public, corporate, and regulatory import. They are emblematic of a “data-generation revolution” (Gal and Lynskey 2024: 1091) that is already raising pressing questions for the academic surveillance studies community. Within surveillance studies scholarship, Fussey (2022: 348) has argued that synthetic media is one of several “issues of urgent societal and planetary concern” and that it has “arguably never been more important” for surveillance studies “researchers to understand these dynamics and complex processes, evidence their implications, and translate esoteric knowledge to produce meaningful analysis.” Yet, while fields adjacent to surveillance studies have begun to explore the ethical risks of synthetic data, we currently perceive a lack of attention to the surveillance implications of synthetic data and synthetic media in published literature within our field. In response, this Dialogue is designed to help promote thinking and discussion about the links and disconnections between synthetic data, synthetic media, and surveillance…(More)”

Privacy guarantees for personal mobility data in humanitarian response


Paper by Nitin Kohli,  Emily Aiken & Joshua E. Blumenstock: “Personal mobility data from mobile phones and other sensors are increasingly used to inform policymaking during pandemics, natural disasters, and other humanitarian crises. However, even aggregated mobility traces can reveal private information about individual movements to potentially malicious actors. This paper develops and tests an approach for releasing private mobility data, which provides formal guarantees over the privacy of the underlying subjects. Specifically, we (1) introduce an algorithm for constructing differentially private mobility matrices and derive privacy and accuracy bounds on this algorithm; (2) use real-world data from mobile phone operators in Afghanistan and Rwanda to show how this algorithm can enable the use of private mobility data in two high-stakes policy decisions: pandemic response and the distribution of humanitarian aid; and (3) discuss practical decisions that need to be made when implementing this approach, such as how to optimally balance privacy and accuracy. Taken together, these results can help enable the responsible use of private mobility data in humanitarian response…(More)”.

Digital surveillance capitalism and cities: data, democracy and activism


Paper by Ashish Makanadar: “The rapid convergence of urbanization and digital technologies is fundamentally reshaping city governance through data-driven systems. This transformation, however, is largely controlled by surveillance capitalist entities, raising profound concerns for democratic values and citizen rights. As private interests extract behavioral data from public spaces without adequate oversight, the principles of transparency and civic participation are increasingly threatened. This erosion of data sovereignty represents a critical juncture in urban development, demanding urgent interdisciplinary attention. This comment proposes a paradigm shift in urban data governance, advocating for the reclamation of data sovereignty to prioritize community interests over corporate profit motives. The paper explores socio-technical pathways to achieve this goal, focusing on grassroots approaches that assert ‘data dignity’ through privacy-enhancing technologies and digital anonymity tools. It argues for the creation of distributed digital commons as viable alternatives to proprietary data silos, thereby democratizing access to and control over urban data. The discussion extends to long-term strategies, examining the potential of blockchain technologies and decentralized autonomous organizations in enabling self-sovereign data economies. These emerging models offer a vision of ‘crypto-cities’ liberated from extractive data practices, fostering environments where residents retain autonomy over their digital footprints. By critically evaluating these approaches, the paper aims to catalyze a reimagining of smart city technologies aligned with principles of equity, shared prosperity, and citizen empowerment. This realignment is essential for preserving democratic values in an increasingly digitized urban landscape…(More)”.

Predictability, AI, And Judicial Futurism: Why Robots Will Run The Law And Textualists Will Like It


Paper by Jack Kieffaber: “The question isn’t whether machines are going to replace judges and lawyers—they are. The question is whether that’s a good thing. If you’re a textualist, you have to answer yes. But you won’t—which means you’re not a textualist. Sorry.

Hypothetical: The year is 2030.  AI has far eclipsed the median federal jurist as a textual interpreter. A new country is founded; it’s a democratic republic that uses human legislators to write laws and programs a state-sponsored Large Language Model called “Judge.AI” to apply those laws to facts. The model makes judicial decisions as to conduct on the back end, but can also provide advisory opinions on the front end; if a citizen types in his desired action and hits “enter,” Judge.AI will tell him, ex ante, exactly what it would decide ex post if the citizen were to perform the action and be prosecuted. The primary result is perfect predictability; secondary results include the abolition of case law, the death of common law, and the replacement of all judges—indeed, all lawyers—by a single machine. Don’t fight the hypothetical, assume it works. This article poses the question:  Is that a utopia or a dystopia?

If you answer dystopia, you cannot be a textualist. Part I of this article establishes why:  Because predictability is textualism’s only lodestar, and Judge.AI is substantially more predictable than any regime operating today. Part II-A dispatches rebuttals premised on positive nuances of the American system; such rebuttals forget that my hypothetical presumes a new nation and take for granted how much of our nation’s founding was premised on mitigating exactly the kinds of human error that Judge.AI would eliminate. And Part II-B dispatches normative rebuttals, which ultimately amount to moral arguments about objective good—which are none of the textualist’s business. 

When the dust clears, you have only two choices: You’re a moralist, or you’re a formalist. If you’re the former, you’ll need a complete account of the objective good—which has evaded man for his entire existence. If you’re the latter, you should relish the fast-approaching day when all laws and all lawyers are usurped by a tin box.  But you’re going to say you’re something in between. And you’re not…(More)”

The Next Phase of the Data Economy: Economic & Technological Perspectives


Paper by Jad Esber et al: The data economy is poised to evolve toward a model centered on individual agency and control, moving us toward a world where data is more liquid across platforms and applications. In this future, products will either utilize existing personal data stores or create them when they don’t yet exist, empowering individuals to fully leverage their own data for various use cases.

The analysis begins by establishing a foundation for understanding data as an economic good and the dynamics of data markets. The article then investigates the concept of personal data stores, analyzing the historical challenges that have limited their widespread adoption. Building on this foundation, the article then considers how recent shifts in regulation, technology, consumer behavior, and market forces are converging to create new opportunities for a user-centric data economy. The article concludes by discussing potential frameworks for value creation and capture within this evolving paradigm, summarizing key insights and potential future directions for research, development, and policy.

We hope this article can help shape the thinking of scholars, policymakers, investors, and entrepreneurs, as new data ownership and privacy technologies emerge, and regulatory bodies around the world mandate open flows of data and new terms of service intended to empower users as well as small-to-medium–sized businesses…(More)”.

Online consent: how much do we need to know?


Paper by Bartlomiej Chomanski & Lode Lauwaert: “When you visit a website and click a button that says, ‘I agree to these terms’—do you really agree? Many scholars who consider this question (Solove 2013; Barocas & Nissenbaum 2014; Hull 2015; Pascalev 2017; Yeung 2017; Becker 2019; Zuboff 2019; Andreotta et al. 2022; Wolmarans and Vorhoeve 2022) would tend to answer ‘no’—or, at the very least, they would deem your agreement normatively deficient. The reasoning behind that conclusion is in large part driven by the claim that when most people click ‘I agree’ when visiting online websites and platforms, they do not really know what they are agreeing to. Their lack of knowledge about the privacy policy and other terms of the online agreements thus makes their consent problematic in morally salient ways.

We argue that this prevailing view is wrong. Uninformed consent to online terms and conditions (what we will call, for short, ‘online consent’) is less ethically problematic than many scholars suppose. Indeed, we argue that uninformed online consent preceded by the legitimate exercise of the right not to know (RNTK, to be explained below) is prima facie valid and does not appear normatively deficient in other ways, despite being uninformed.

The paper proceeds as follows. In Sect. 2, we make more precise the concept of online consent and summarize the case against it, as presented in the literature. In Sect. 3 we explain the arguments for the RNTK in bioethics and show that analogous reasoning leads to endorsing the RNTK in online contexts. In Sect. 4, we demonstrate that the appeal to the RNTK helps defuse the critics’ arguments against online consent. Section 5 concludes: online consent is valid (with caveats, to be explored in what follows)…(More)”

An Open Source Python Library for Anonymizing Sensitive Data


Paper by Judith Sáinz-Pardo Díaz & Álvaro López García: “Open science is a fundamental pillar to promote scientific progress and collaboration, based on the principles of open data, open source and open access. However, the requirements for publishing and sharing open data are in many cases difficult to meet in compliance with strict data protection regulations. Consequently, researchers need to rely on proven methods that allow them to anonymize their data without sharing it with third parties. To this end, this paper presents the implementation of a Python library for the anonymization of sensitive tabular data. This framework provides users with a wide range of anonymization methods that can be applied on the given dataset, including the set of identifiers, quasi-identifiers, generalization hierarchies and allowed level of suppression, along with the sensitive attribute and the level of anonymity required. The library has been implemented following best practices for integration and continuous development, as well as the use of workflows to test code coverage based on unit and functional tests…(More)”.

Garden city: A synthetic dataset and sandbox environment for analysis of pre-processing algorithms for GPS human mobility data



Paper by Thomas H. Li, and Francisco Barreras: “Human mobility datasets have seen increasing adoption in the past decade, enabling diverse applications that leverage the high precision of measured trajectories relative to other human mobility datasets. However, there are concerns about whether the high sparsity in some commercial datasets can introduce errors due to lack of robustness in processing algorithms, which could compromise the validity of downstream results. The scarcity of “ground-truth” data makes it particularly challenging to evaluate and calibrate these algorithms. To overcome these limitations and allow for an intermediate form of validation of common processing algorithms, we propose a synthetic trajectory simulator and sandbox environment meant to replicate the features of commercial datasets that could cause errors in such algorithms, and which can be used to compare algorithm outputs with “ground-truth” synthetic trajectories and mobility diaries. Our code is open-source and is publicly available alongside tutorial notebooks and sample datasets generated with it….(More)”

Generative Agent Simulations of 1,000 People


Paper by Joon Sung Park: “The promise of human behavioral simulation–general-purpose computational agents that replicate human behavior across domains–could enable broad applications in policymaking and social science. We present a novel agent architecture that simulates the attitudes and behaviors of 1,052 real individuals–applying large language models to qualitative interviews about their lives, then measuring how well these agents replicate the attitudes and behaviors of the individuals that they represent. The generative agents replicate participants’ responses on the General Social Survey 85% as accurately as participants replicate their own answers two weeks later, and perform comparably in predicting personality traits and outcomes in experimental replications. Our architecture reduces accuracy biases across racial and ideological groups compared to agents given demographic descriptions. This work provides a foundation for new tools that can help investigate individual and collective behavior…(More)”.