Superbloom: How Technologies of Connection Tear Us Apart


Book by Nicholas Carr: “From the telegraph and telephone in the 1800s to the internet and social media in our own day, the public has welcomed new communication systems. Whenever people gain more power to share information, the assumption goes, society prospers. Superbloom tells a startlingly different story. As communication becomes more mechanized and efficient, it breeds confusion more than understanding, strife more than harmony. Media technologies all too often bring out the worst in us.

A celebrated writer on the human consequences of technology, Nicholas Carr reorients the conversation around modern communication, challenging some of our most cherished beliefs about self-expression, free speech, and media democratization. He reveals how messaging apps strip nuance from conversation, how “digital crowding” erodes empathy and triggers aggression, how online political debates narrow our minds and distort our perceptions, and how advances in AI are further blurring the already hazy line between fantasy and reality.

Even as Carr shows how tech companies and their tools of connection have failed us, he forces us to confront inconvenient truths about our own nature. The human psyche, it turns out, is profoundly ill-suited to the “superbloom” of information that technology has unleashed.

With rich psychological insights and vivid examples drawn from history and science, Superbloom provides both a panoramic view of how media shapes society and an intimate examination of the fate of the self in a time of radical dislocation. It may be too late to change the system, Carr counsels, but it’s not too late to change ourselves…(More)”.

Towards Best Practices for Open Datasets for LLM Training


Paper by Stefan Baack et al: “Many AI companies are training their large language models (LLMs) on data without the permission of the copyright owners. The permissibility of doing so varies by jurisdiction: in countries like the EU and Japan, this is allowed under certain restrictions, while in the United States, the legal landscape is more ambiguous. Regardless of the legal status, concerns from creative producers have led to several high-profile copyright lawsuits, and the threat of litigation is commonly cited as a reason for the recent trend towards minimizing the information shared about training datasets by both corporate and public interest actors. This trend in limiting data information causes harm by hindering transparency, accountability, and innovation in the broader ecosystem by denying researchers, auditors, and impacted individuals access to the information needed to understand AI models.
While this could be mitigated by training language models on open access and public domain data, at the time of writing, there are no such models (trained at a meaningful scale) due to the substantial technical and sociological challenges in assembling the necessary corpus. These challenges include incomplete and unreliable metadata, the cost and complexity of digitizing physical records, and the diverse set of legal and technical skills required to ensure relevance and responsibility in a quickly changing landscape. Building towards a future where AI systems can be trained on openly licensed data that is responsibly curated and governed requires collaboration across legal, technical, and policy domains, along with investments in metadata standards, digitization, and fostering a culture of openness…(More)”.

Beware the Intention Economy: Collection and Commodification of Intent via Large Language Models


Article by Yaqub Chaudhary and Jonnie Penn: “The rapid proliferation of large language models (LLMs) invites the possibility of a new marketplace for behavioral and psychological data that signals intent. This brief article introduces some initial features of that emerging marketplace. We survey recent efforts by tech executives to position the capture, manipulation, and commodification of human intentionality as a lucrative parallel to—and viable extension of—the now-dominant attention economy, which has bent consumer, civic, and media norms around users’ finite attention spans since the 1990s. We call this follow-on the intention economy. We characterize it in two ways. First, as a competition, initially, between established tech players armed with the infrastructural and data capacities needed to vie for first-mover advantage on a new frontier of persuasive technologies. Second, as a commodification of hitherto unreachable levels of explicit and implicit data that signal intent, namely those signals borne of combining (a) hyper-personalized manipulation via LLM-based sycophancy, ingratiation, and emotional infiltration and (b) increasingly detailed categorization of online activity elicited through natural language.

This new dimension of automated persuasion draws on the unique capabilities of LLMs and generative AI more broadly, which intervene not only on what users want, but also, to cite Williams, “what they want to want” (Williams, 2018, p. 122). We demonstrate through a close reading of recent technical and critical literature (including unpublished papers from ArXiv) that such tools are already being explored to elicit, infer, collect, record, understand, forecast, and ultimately manipulate, modulate, and commodify human plans and purposes, both mundane (e.g., selecting a hotel) and profound (e.g., selecting a political candidate)…(More)”.

How and When to Involve Crowds in Scientific Research


Book by Marion K. Poetz and Henry Sauermann: “This book explores how millions of people can significantly contribute to scientific research with their effort and experience, even if they are not working at scientific institutions and may not have formal scientific training. 

Drawing on a strong foundation of scholarship on crowd involvement, this book helps researchers recognize and understand the benefits and challenges of crowd involvement across key stages of the scientific process. Designed as a practical toolkit, it enables scientists to critically assess the potential of crowd participation, determine when it can be most effective, and implement it to achieve meaningful scientific and societal outcomes.

The book also discusses how recent developments in artificial intelligence (AI) shape the role of crowds in scientific research and can enhance the effectiveness of crowd science projects…(More)”

Governing artificial intelligence means governing data: (re)setting the agenda for data justice


Paper by Linnet Taylor, Siddharth Peter de Souza, Aaron Martin, and Joan López Solano: “The field of data justice has been evolving to take into account the role of data in powering the field of artificial intelligence (AI). In this paper we review the main conceptual bases for governing data and AI: the market-based approach, the personal–non-personal data distinction and strategic sovereignty. We then analyse how these are being operationalised into practical models for governance, including public data trusts, data cooperatives, personal data sovereignty, data collaboratives, data commons approaches and indigenous data sovereignty. We interrogate these models’ potential for just governance based on four benchmarks which we propose as a reformulation of the Data Justice governance agenda identified by Taylor in her 2017 framework. Re-situating data justice at the intersection of data and AI, these benchmarks focus on preserving and strengthening public infrastructures and public goods; inclusiveness; contestability and accountability; and global responsibility. We demonstrate how they can be used to test whether a governance approach will succeed in redistributing power, engaging with public concerns and creating a plural politics of AI…(More)”.

Artificial Intelligence Narratives


A Global Voices Report: “…Framing AI systems as intelligent is further complicated and intertwined with neighboring narratives. In the US, AI narratives often revolve around opposing themes such as hope and fear, often bridging two strong emotions: existential fears and economic aspirations. In either case, they propose that the technology is powerful. These narratives contribute to the hype surrounding AI tools and their potential impact on society. Some examples include:

Many of these framings often present AI as an unstoppable and accelerating force. While this narrative can generate excitement and investment in AI research, it can also contribute to a sense of technological determinism and a lack of critical engagement with the consequences of widespread AI adoption. Counter-narratives are many and expand on the motifs of surveillance, erosions of trust, bias, job impacts, exploitation of labor, high-risk uses, the concentration of power, and environmental impacts, among others.

These narrative frames, combined with the metaphorical language and imagery used to describe AI, contribute to the confusion and lack of public knowledge about the technology. By positioning AI as a transformative, inevitable, and necessary tool for national success, these narratives can shape public opinion and policy decisions, often in ways that prioritize rapid adoption and commercialization…(More)”

Information Ecosystems and Troubled Democracy


Report by the Observatory on Information and Democracy: “This inaugural meta-analysis provides a critical assessment of the role of information ecosystems in the Global North and Global Majority World, focusing on their relationship with information integrity (the quality of public discourse), the fairness of political processes, the protection of media freedoms, and the resilience of public institutions.

The report addresses three thematic areas with a cross-cutting theme of mis- and disinformation:

  • Media, Politics and Trust;
  • Artificial Intelligence, Information Ecosystems and Democracy;
  • and Data Governance and Democracy.

The analysis is based mainly on academic publications supplemented by reports and other materials from different disciplines and regions (1,664 citations selected among a total corpus of over +2700 resources aggregated). The report showcases what we can learn from landmark research on often intractable challenges posed by rapid changes in information and communication spaces…(More)”.

What’s a Fact, Anyway?


Essay by Fergus McIntosh: “…For journalists, as for anyone, there are certain shortcuts to trustworthiness, including reputation, expertise, and transparency—the sharing of sources, for example, or the prompt correction of errors. Some of these shortcuts are more perilous than others. Various outfits, positioning themselves as neutral guides to the marketplace of ideas, now tout evaluations of news organizations’ trustworthiness, but relying on these requires trusting in the quality and objectivity of the evaluation. Official data is often taken at face value, but numbers can conceal motives: think of the dispute over how to count casualties in recent conflicts. Governments, meanwhile, may use their powers over information to suppress unfavorable narratives: laws originally aimed at misinformation, many enacted during the COVID-19 pandemic, can hinder free expression. The spectre of this phenomenon is fuelling a growing backlash in America and elsewhere.

Although some categories of information may come to be considered inherently trustworthy, these, too, are in flux. For decades, the technical difficulty of editing photographs and videos allowed them to be treated, by most people, as essentially incontrovertible. With the advent of A.I.-based editing software, footage and imagery have swiftly become much harder to credit. Similar tools are already used to spoof voices based on only seconds of recorded audio. For anyone, this might manifest in scams (your grandmother calls, but it’s not Grandma on the other end), but for a journalist it also puts source calls into question. Technologies of deception tend to be accompanied by ones of detection or verification—a battery of companies, for example, already promise that they can spot A.I.-manipulated imagery—but they’re often locked in an arms race, and they never achieve total accuracy. Though chatbots and A.I.-enabled search engines promise to help us with research (when a colleague “interviewed” ChatGPT, it told him, “I aim to provide information that is as neutral and unbiased as possible”), their inability to provide sourcing, and their tendency to hallucinate, looks more like a shortcut to nowhere, at least for now. The resulting problems extend far beyond media: election campaigns, in which subtle impressions can lead to big differences in voting behavior, feel increasingly vulnerable to deepfakes and other manipulations by inscrutable algorithms. Like everyone else, journalists have only just begun to grapple with the implications.

In such circumstances, it becomes difficult to know what is true, and, consequently, to make decisions. Good journalism offers a way through, but only if readers are willing to follow: trust and naïveté can feel uncomfortably close. Gaining and holding that trust is hard. But failure—the end point of the story of generational decay, of gold exchanged for dross—is not inevitable. Fact checking of the sort practiced at The New Yorker is highly specific and resource-intensive, and it’s only one potential solution. But any solution must acknowledge the messiness of truth, the requirements of attention, the way we squint to see more clearly. It must tell you to say what you mean, and know that you mean it…(More)”.

Governance of Indigenous data in open earth systems science


Paper by Lydia Jennings et al: “In the age of big data and open science, what processes are needed to follow open science protocols while upholding Indigenous Peoples’ rights? The Earth Data Relations Working Group (EDRWG), convened to address this question and envision a research landscape that acknowledges the legacy of extractive practices and embraces new norms across Earth science institutions and open science research. Using the National Ecological Observatory Network (NEON) as an example, the EDRWG recommends actions, applicable across all phases of the data lifecycle, that recognize the sovereign rights of Indigenous Peoples and support better research across all Earth Sciences…(More)”

Facing & mitigating common challenges when working with real-world data: The Data Learning Paradigm


Paper by Jake Lever et al: “The rapid growth of data-driven applications is ubiquitous across virtually all scientific domains, and has led to an increasing demand for effective methods to handle data deficiencies and mitigate the effects of imperfect data. This paper presents a guide for researchers encountering real-world data-driven applications, and the respective challenges associated with this. This article proposes the concept of the Data Learning Paradigm, combining the principles of machine learning, data science and data assimilation to tackle real-world challenges in data-driven applications. Models are a product of the data upon which they are trained, and no data collected from real world scenarios is perfect due to natural limitations of sensing and collection. Thus, computational modelling of real world systems is intrinsically limited by the various deficiencies encountered in real data. The Data Learning Paradigm aims to leverage the strengths of data improvement to enhance the accuracy, reliability, and interpretability of data-driven models. We outline a range of methods which are currently being implemented in the field of Data Learning involving machine learning and data science methods, and discuss how these mitigate the various problems associated with data-driven models, illustrating improved results in a multitude of real world applications. We highlight examples where these methods have led to significant advancements in fields such as environmental monitoring, planetary exploration, healthcare analytics, linguistic analysis, social networks, and smart manufacturing. We offer a guide to how these methods may be implemented to deal with general types of limitations in data, alongside their current and potential applications…(More)”.