Paper by Stefan Baack et al: “Many AI companies are training their large language models (LLMs) on data without the permission of the copyright owners. The permissibility of doing so varies by jurisdiction: in countries like the EU and Japan, this is allowed under certain restrictions, while in the United States, the legal landscape is more ambiguous. Regardless of the legal status, concerns from creative producers have led to several high-profile copyright lawsuits, and the threat of litigation is commonly cited as a reason for the recent trend towards minimizing the information shared about training datasets by both corporate and public interest actors. This trend in limiting data information causes harm by hindering transparency, accountability, and innovation in the broader ecosystem by denying researchers, auditors, and impacted individuals access to the information needed to understand AI models.
While this could be mitigated by training language models on open access and public domain data, at the time of writing, there are no such models (trained at a meaningful scale) due to the substantial technical and sociological challenges in assembling the necessary corpus. These challenges include incomplete and unreliable metadata, the cost and complexity of digitizing physical records, and the diverse set of legal and technical skills required to ensure relevance and responsibility in a quickly changing landscape. Building towards a future where AI systems can be trained on openly licensed data that is responsibly curated and governed requires collaboration across legal, technical, and policy domains, along with investments in metadata standards, digitization, and fostering a culture of openness…(More)”.
Beware the Intention Economy: Collection and Commodification of Intent via Large Language Models
Article by Yaqub Chaudhary and Jonnie Penn: “The rapid proliferation of large language models (LLMs) invites the possibility of a new marketplace for behavioral and psychological data that signals intent. This brief article introduces some initial features of that emerging marketplace. We survey recent efforts by tech executives to position the capture, manipulation, and commodification of human intentionality as a lucrative parallel to—and viable extension of—the now-dominant attention economy, which has bent consumer, civic, and media norms around users’ finite attention spans since the 1990s. We call this follow-on the intention economy. We characterize it in two ways. First, as a competition, initially, between established tech players armed with the infrastructural and data capacities needed to vie for first-mover advantage on a new frontier of persuasive technologies. Second, as a commodification of hitherto unreachable levels of explicit and implicit data that signal intent, namely those signals borne of combining (a) hyper-personalized manipulation via LLM-based sycophancy, ingratiation, and emotional infiltration and (b) increasingly detailed categorization of online activity elicited through natural language.
This new dimension of automated persuasion draws on the unique capabilities of LLMs and generative AI more broadly, which intervene not only on what users want, but also, to cite Williams, “what they want to want” (Williams, 2018, p. 122). We demonstrate through a close reading of recent technical and critical literature (including unpublished papers from ArXiv) that such tools are already being explored to elicit, infer, collect, record, understand, forecast, and ultimately manipulate, modulate, and commodify human plans and purposes, both mundane (e.g., selecting a hotel) and profound (e.g., selecting a political candidate)…(More)”.
Governing artificial intelligence means governing data: (re)setting the agenda for data justice
Paper by Linnet Taylor, Siddharth Peter de Souza, Aaron Martin, and Joan López Solano: “The field of data justice has been evolving to take into account the role of data in powering the field of artificial intelligence (AI). In this paper we review the main conceptual bases for governing data and AI: the market-based approach, the personal–non-personal data distinction and strategic sovereignty. We then analyse how these are being operationalised into practical models for governance, including public data trusts, data cooperatives, personal data sovereignty, data collaboratives, data commons approaches and indigenous data sovereignty. We interrogate these models’ potential for just governance based on four benchmarks which we propose as a reformulation of the Data Justice governance agenda identified by Taylor in her 2017 framework. Re-situating data justice at the intersection of data and AI, these benchmarks focus on preserving and strengthening public infrastructures and public goods; inclusiveness; contestability and accountability; and global responsibility. We demonstrate how they can be used to test whether a governance approach will succeed in redistributing power, engaging with public concerns and creating a plural politics of AI…(More)”.
Boosting: Empowering Citizens with Behavioral Science
Paper by Stefan M. Herzog and Ralph Hertwig: “…Behavioral public policy came to the fore with the introduction of nudging, which aims to steer behavior while maintaining freedom of choice. Responding to critiques of nudging (e.g., that it does not promote agency and relies on benevolent choice architects), other behavioral policy approaches focus on empowering citizens. Here we review boosting, a behavioral policy approach that aims to foster people’s agency, self-control, and ability to make informed decisions. It is grounded in evidence from behavioral science showing that human decision making is not as notoriously flawed as the nudging approach assumes. We argue that addressing the challenges of our time—such as climate change, pandemics, and the threats to liberal democracies and human autonomy posed by digital technologies and choice architectures—calls for fostering capable and engaged citizens as a first line of response to complement slower, systemic approaches…(More)”.
Data sharing restrictions are hampering precision health in the European Union
Paper by Cristina Legido-Quigley et al: “Contemporary healthcare is undergoing a transition, shifting from a population-based approach to personalized medicine on an individual level. In October 2023, the European Partnership for Personalized Medicine was officially launched to communicate the benefits of this approach to citizens and healthcare systems in member countries. The main debate revolves around the inconsistency in regulatory changes within personal data access and its potential commercialization. Moreover, the lack of unified consensus within European Union (EU) countries is leading to problems with data sharing to progress personalized medicine. Here we discuss the integration of biological data with personal information on a European scale for the advancement of personalized medicine, raising legal considerations of data protection under the EU General Data Protection Regulation (GDPR)…(More)”.
Governance of Indigenous data in open earth systems science
Paper by Lydia Jennings et al: “In the age of big data and open science, what processes are needed to follow open science protocols while upholding Indigenous Peoples’ rights? The Earth Data Relations Working Group (EDRWG), convened to address this question and envision a research landscape that acknowledges the legacy of extractive practices and embraces new norms across Earth science institutions and open science research. Using the National Ecological Observatory Network (NEON) as an example, the EDRWG recommends actions, applicable across all phases of the data lifecycle, that recognize the sovereign rights of Indigenous Peoples and support better research across all Earth Sciences…(More)”
Facing & mitigating common challenges when working with real-world data: The Data Learning Paradigm
Paper by Jake Lever et al: “The rapid growth of data-driven applications is ubiquitous across virtually all scientific domains, and has led to an increasing demand for effective methods to handle data deficiencies and mitigate the effects of imperfect data. This paper presents a guide for researchers encountering real-world data-driven applications, and the respective challenges associated with this. This article proposes the concept of the Data Learning Paradigm, combining the principles of machine learning, data science and data assimilation to tackle real-world challenges in data-driven applications. Models are a product of the data upon which they are trained, and no data collected from real world scenarios is perfect due to natural limitations of sensing and collection. Thus, computational modelling of real world systems is intrinsically limited by the various deficiencies encountered in real data. The Data Learning Paradigm aims to leverage the strengths of data improvement to enhance the accuracy, reliability, and interpretability of data-driven models. We outline a range of methods which are currently being implemented in the field of Data Learning involving machine learning and data science methods, and discuss how these mitigate the various problems associated with data-driven models, illustrating improved results in a multitude of real world applications. We highlight examples where these methods have led to significant advancements in fields such as environmental monitoring, planetary exploration, healthcare analytics, linguistic analysis, social networks, and smart manufacturing. We offer a guide to how these methods may be implemented to deal with general types of limitations in data, alongside their current and potential applications…(More)”.
Sortition: Past and Present
Introduction to the Journal of Sortition: “Since ancient times sortition (random selection by lot) has been used both to distribute political office and as a general prophylactic against factionalism and corruption in societies as diverse as classical-era Athens and the Most Serene Republic of Venice. Lotteries have also been employed for the allocation of scarce goods such as social housing and school places to eliminate bias and ensure just distribution, along with drawing lots in circumstances where unpopular tasks or tragic choices are involved (as some situations are beyond rational human decision-making). More recently, developments in public opinion polling using random sampling have led to the proliferation of citizens’ assemblies selected by lot. Some activists have even proposed such bodies as an alternative to elected representatives. The Journal of Sortition benefits from an editorial board with a wide range of expertise and perspectives in this area. In this introduction to the first issue, we have invited our editors to explain why they are interested in sortition, and to outline the benefits (and pitfalls) of the recent explosion of interest in the topic…(More)”.
Digitalizing sewage: The politics of producing, sharing, and operationalizing data from wastewater-based surveillance
Paper by Josie Wittmer, Carolyn Prouse, and Mohammed Rafi Arefin: “Expanded during the COVID-19 pandemic, Wastewater-Based Surveillance (WBS) is now heralded by scientists and policy makers alike as the future of monitoring and governing urban health. The expansion of WBS reflects larger neoliberal governance trends whereby digitalizing states increasingly rely on producing big data as a ‘best practice’ to surveil various aspects of everyday life. With a focus on three South Asian cities, our paper investigates the transnational pathways through which WBS data is produced, made known, and operationalized in ‘evidence-based’ decision-making in a time of crisis. We argue that in South Asia, wastewater surveillance data is actively produced through fragile but power-laden networks of transnational and local knowledge, funding, and practices. Using mixed qualitative methods, we found these networks produced artifacts like dashboards to communicate data to the public in ways that enabled claims to objectivity, ethical interventions, and transparency. Interrogating these representations, we demonstrate how these artifacts open up messy spaces of translation that trouble linear notions of objective data informing accountable, transparent, and evidence-based decision-making for diverse urban actors. By thinking through the production of precarious biosurveillance infrastructures, we respond to calls for more robust ethical and legal frameworks for the field and suggest that the fragility of WBS infrastructures has important implications for the long-term trajectories of urban public health governance in the global South…(More)”
Theorizing the functions and patterns of agency in the policymaking process
Paper by Giliberto Capano, et al: “Theories of the policy process understand the dynamics of policymaking as the result of the interaction of structural and agency variables. While these theories tend to conceptualize structural variables in a careful manner, agency (i.e. the actions of individual agents, like policy entrepreneurs, policy leaders, policy brokers, and policy experts) is left as a residual piece in the puzzle of the causality of change and stability. This treatment of agency leaves room for conceptual overlaps, analytical confusion and empirical shortcomings that can complicate the life of the empirical researcher and, most importantly, hinder the ability of theories of the policy process to fully address the drivers of variation in policy dynamics. Drawing on Merton’s concept of function, this article presents a novel theorization of agency in the policy process. We start from the assumption that agency functions are a necessary component through which policy dynamics evolve. We then theorise that agency can fulfil four main functions – steering, innovation, intermediation and intelligence – that need to be performed, by individual agents, in any policy process through four patterns of action – leadership, entrepreneurship, brokerage and knowledge accumulation – and we provide a roadmap for operationalising and measuring these concepts. We then demonstrate what can be achieved in terms of analytical clarity and potential theoretical leverage by applying this novel conceptualisation to two major policy process theories: the Multiple Streams Framework (MSF) and the Advocacy Coalition Framework (ACF)…(More)”.