DATA – Page 2 – The Living Library

Further Reflections on the Journey Towards an International Framework for Data Governance

Curated on June 25, 2025June 26, 2025 by Stefaan Verhulst

Paper by Steve MacFeely, Angela Me, Rachael Beaven, Joseph Costanzo, David Passarelli, Carolina Rossini, Friederike Schueuer, Malarvizhi Veerappan, and Stefaan Verhulst: “The use of data is paramount both to inform individual decisions and to address major global challenges. Data are the lifeblood of the digital economy, feeding algorithms, currencies, artificial intelligence, and driving international services trade, improving the way we respond to crises, informing logistics, shaping markets, communications and politics. But data are not just an economic commodity, to be traded and harvested, they are a personal and social artifact. They contain our most personal and sensitive information – our financial and health records, our networks, our memories, and our most intimate secrets and aspirations. With the advent of digitalization and the internet, our data are ubiquitous – we are the sum of our data. Consequently, this powerful treasure trove needs to be protected carefully. This paper presents arguments for an international data governance framework, the barriers to achieving such a framework and some of the costs of failure. It also articulates why the United Nations is uniquely positioned to host such a framework, and learning from history, the opportunity available to solve a global problem…(More)”.

AI Scraping Bots Are Breaking Open Libraries, Archives, and Museums

Curated on June 25, 2025June 26, 2025 by Stefaan Verhulst

Article by Emanuel Maiberg: “The report, titled “Are AI Bots Knocking Cultural Heritage Offline?” was written by Weinberg of the GLAM-E Lab, a joint initiative between the Centre for Science, Culture and the Law at the University of Exeter and the Engelberg Center on Innovation Law & Policy at NYU Law, which works with smaller cultural institutions and community organizations to build open access capacity and expertise. GLAM is an acronym for galleries, libraries, archives, and museums. The report is based on a survey of 43 institutions with open online resources and collections in Europe, North America, and Oceania. Respondents also shared data and analytics, and some followed up with individual interviews. The data is anonymized so institutions could share information more freely, and to prevent AI bot operators from undermining their counter measures.

Of the 43 respondents, 39 said they had experienced a recent increase in traffic. Twenty-seven of those 39 attributed the increase in traffic to AI training data bots, with an additional seven saying the AI bots could be contributing to the increase.

“Multiple respondents compared the behavior of the swarming bots to more traditional online behavior such as Distributed Denial of Service (DDoS) attacks designed to maliciously drive unsustainable levels of traffic to a server, effectively taking it offline,” the report said. “Like a DDoS incident, the swarms quickly overwhelm the collections, knocking servers offline and forcing administrators to scramble to implement countermeasures. As one respondent noted, ‘If they wanted us dead, we’d be dead.’”…(More)”

AI and Social Media: A Political Economy Perspective

Curated on June 24, 2025June 24, 2025 by Stefaan Verhulst

Paper by Daron Acemoglu, Asuman Ozdaglar & James Siderius: “We consider the political consequences of the use of artificial intelligence (AI) by online platforms engaged in social media content dissemination, entertainment, or electronic commerce. We identify two distinct but complementary mechanisms, the social media channel and the digital ads channel, which together and separately contribute to the polarization of voters and consequently the polarization of parties. First, AI-driven recommendations aimed at maximizing user engagement on platforms create echo chambers (or “filter bubbles”) that increase the likelihood that individuals are not confronted with counter-attitudinal content. Consequently, social media engagement makes voters more polarized, and then parties respond by becoming more polarized themselves. Second, we show that party competition can encourage platforms to rely more on targeted digital ads for monetization (as opposed to a subscription-based business model), and such ads in turn make the electorate more polarized, further contributing to the polarization of parties. These effects do not arise when one party is dominant, in which case the profit-maximizing business model of the platform is subscription-based. We discuss the impact regulations can have on the polarizing effects of AI-powered online platforms…(More)”.

The Data-Informed City: A Conceptual Framework for Advancing Research and Practice

Curated on June 24, 2025June 25, 2025 by Stefaan Verhulst

Paper by Jorrit de Jong, Fernando Fernandez-Monge et al: “Over the last decades, scholars and practitioners have focused their attention on the use of data for improving public action, with a renewed interest in the emergence of big data and artificial intelligence. The potential of data is particularly salient in cities, where vast amounts of data are being generated from traditional and novel sources. Despite this growing interest, there is a need for a conceptual and operational understanding of the beneficial uses of data. This article presents a comprehensive and precise account of how cities can use data to address problems more effectively, efficiently, equitably, and in a more accountable manner. It does so by synthesizing and augmenting current research with empirical evidence derived from original research and learnings from a program designed to strengthen city governments’ data capacity. The framework can be used to support longitudinal and comparative analyses as well as explore questions such as how different uses of data employed at various levels of maturity can yield disparate outcomes. Practitioners can use the framework to identify and prioritize areas in which building data capacity might further the goals of their teams and organizations…(More)“

Introducing CC Signals: A New Social Contract for the Age of AI

Curated on June 24, 2025June 26, 2025 by Stefaan Verhulst

Creative Commons: “Creative Commons (CC) today announces the public kickoff of the CC signals project, a new preference signals framework designed to increase reciprocity and sustain a creative commons in the age of AI. The development of CC signals represents a major step forward in building a more equitable, sustainable AI ecosystem rooted in shared benefits. This step is the culmination of years of consultation and analysis. As we enter this new phase of work, we are actively seeking input from the public.

As artificial intelligence (AI) transforms how knowledge is created, shared, and reused, we are at a fork in the road that will define the future of access to knowledge and shared creativity. One path leads to data extraction and the erosion of openness; the other leads to a walled-off internet guarded by paywalls. CC signals offer another way, grounded in the nuanced values of the commons expressed by the collective.

Based on the same principles that gave rise to the CC licenses and tens of billions of works openly licensed online, CC signals will allow dataset holders to signal their preferences for how their content can be reused by machines based on a set of limited but meaningful options shaped in the public interest. They are both a technical and legal tool and a social proposition: a call for a new pact between those who share data and those who use it to train AI models.

“CC signals are designed to sustain the commons in the age of AI,” said Anna Tumadóttir, CEO, Creative Commons. “Just as the CC licenses helped build the open web, we believe CC signals will help shape an open AI ecosystem grounded in reciprocity.”

CC signals recognize that change requires systems-level coordination. They are tools that will be built for machine and human readability, and are flexible across legal, technical, and normative contexts. However, at their core CC signals are anchored in mobilizing the power of the collective. While CC signals may range in enforceability, legally binding in some cases and normative in others, their application will always carry ethical weight that says we give, we take, we give again, and we are all in this together.…

Now Ready for Feedback

More information about CC signals and early design decisions are available on the CC website. We are committed to developing CC signals transparently and alongside our partners and community. We are actively seeking public feedback and input over the next few months as we work toward an alpha launch in November 2025….(More)”

Robodebt: When automation fails

Curated on June 23, 2025June 23, 2025 by Stefaan Verhulst

Article by Don Moynihan: “From 2016 to 2020, the Australian government operated an automated debt assessment and recovery system, known as “Robodebt,” to recover fraudulent or overpaid welfare benefits. The goal was to save $4.77 billion through debt recovery and reduced public service costs. However, the algorithm and policies at the heart of Robodebt caused wildly inaccurate assessments, and administrative burdens that disproportionately impacted those with the least resources. After a federal court ruled the policy unlawful, the government was forced to terminate Robodebt and agree to a $1.8 billion settlement.

Robodebt is important because it is an example of a costly failure with automation. By automation, I mean the use of data to create digital defaults for decisions. This could involve the use of AI, or it could mean the use of algorithms reading administrative data. Cases like Robodebt serve as canaries in the coalmine for policymakers interested in using AI or algorithms as an means to downsize public services on the hazy notion that automation will pick up the slack. But I think they are missing the very real risks involved.

To be clear, the lesson is not “all automation is bad.” Indeed, it offer real benefits in potentially reducing administrative costs and hassles and increasing access to public services (e.g. the use of automated or “ex parte” renewals for Medicaid, for example, which Republicans are considering limiting in their new budget bill). It is this promise that makes automation so attractive to policymakers. But it is also the case that automation can be used to deny access to services, and to put people into digital cages that are burdensome to escape from. This is why we need to learn from cases where it has been deployed.

The experience of Robodebt underlines the dangers of using citizens as lab rats to adopt AI on a broad scale before it is has been proven to work. Alongside the parallel collapse of the Dutch government childcare system, Robodebt provides an extraordinarily rich text to understand how automated decision processes can go wrong.

I recently wrote about Robodebt (with co-authors Morten Hybschmann, Kathryn Gimborys, Scott Loudin, Will McClellan), both in the journal of Perspectives on Public Management and Governance and as a teaching case study at the Better Government Lab...(More)”.

Practitioner perspectives on informing decisions in One Health sectors with predictive models

Curated on June 23, 2025June 25, 2025 by Stefaan Verhulst

Paper by Kim M. Pepin: “Every decision a person makes is based on a model. A model is an idea about how a process works based on previous experience, observation, or other data. Models may not be explicit or stated (Johnson-Laird, 2010), but they serve to simplify a complex world. Models vary dramatically from conceptual (idea) to statistical (mathematical expression relating observed data to an assumed process and/or other data) or analytical/computational (quantitative algorithm describing a process). Predictive models of complex systems describe an understanding of how systems work, often in mathematical or statistical terms, using data, knowledge, and/or expert opinion. They provide means for predicting outcomes of interest, studying different management decision impacts, and quantifying decision risk and uncertainty (Berger et al. 2021; Li et al. 2017). They can help decision-makers assimilate how multiple pieces of information determine an outcome of interest about a complex system (Berger et al. 2021; Hemming et al. 2022).

People rely daily on system-level models to reach objectives. Choosing the fastest route to a destination is one example. Such a decision may be based on either a mental model of the road system developed from previous experience or a traffic prediction mapping application based on mathematical algorithms and current data. Either way, a system-level model has been applied and there is some uncertainty. In contrast, predicting outcomes for new and complex phenomena, such as emerging disease spread, a biological invasion risk (Chen et al. 2023; Elderd et al. 2006; Pepin et al. 2022), or climatic impacts on ecosystems is more uncertain. Here public service decision-makers may turn to mathematical models when expert opinion and experience do not resolve enough uncertainty about decision outcomes. But using models to guide decisions also relies on expert opinion and experience. Also, even technical experts need to make modeling choices regarding model structure and data inputs that have uncertainty (Elderd et al. 2006) and these might not be completely objective decisions (Bedson et al. 2021). Thus, using models for guiding decisions has subjectivity from both the developer and end-user, which can lead to apprehension or lack of trust about using models to inform decisions.

Models may be particularly advantageous to decision-making in One Health sectors, including health of humans, agriculture, wildlife, and the environment (hereafter called One Health sectors) and their interconnectedness (Adisasmito et al. 2022)…(More)”.

The Global A.I. Divide

Curated on June 23, 2025June 23, 2025 by Stefaan Verhulst

Article by Adam Satariano and Paul Mozur: “Last month, Sam Altman, the chief executive of the artificial intelligence company OpenAI, donned a helmet, work boots and a luminescent high-visibility vest to visit the construction site of the company’s new data center project in Texas.

Bigger than New York’s Central Park, the estimated $60 billion project, which has its own natural gas plant, will be one of the most powerful computing hubs ever created when completed as soon as next year.

Around the same time as Mr. Altman’s visit to Texas, Nicolás Wolovick, a computer science professor at the National University of Córdoba in Argentina, was running what counts as one of his country’s most advanced A.I. computing hubs. It was in a converted room at the university, where wires snaked between aging A.I. chips and server computers.

“Everything is becoming more split,” Dr. Wolovick said. “We are losing.”

Artificial intelligence has created a new digital divide, fracturing the world between nations with the computing power for building cutting-edge A.I. systems and those without. The split is influencing geopolitics and global economics, creating new dependencies and prompting a desperate rush to not be excluded from a technology race that could reorder economies, drive scientific discovery and change the way that people live and work.

The biggest beneficiaries by far are the United States, China and the European Union. Those regions host more than half of the world’s most powerful data centers, which are used for developing the most complex A.I. systems, according to data compiled by Oxford University researchers. Only 32 countries, or about 16 percent of nations, have these large facilities filled with microchips and computers, giving them what is known in industry parlance as “compute power.”..(More)”.

Library Catalogues as Data: Research, Practice and Usage

Curated on June 23, 2025June 23, 2025 by Stefaan Verhulst

Book by Paul Gooding, Melissa Terras, and Sarah Ames: “Through the web of library catalogues, library management systems and myriad digital resources, libraries have become repositories not only for physical and digital information resources but also for enormous amounts of data about the interactions between these resources and their users. Bringing together leading practitioners and academic voices, this book considers library catalogue data as a vital research resource.

Divided into four sections, each approaches library catalogues, collections and records from a different angle, from exploring methods for examining such data; to the politics of catalogues and library data; their interdisciplinary potential; and practical uses and applications of catalogues as data. Other topics the volume discusses include:

Practical routes to preparing library catalogue data for researchers
The ethics of library metadata privacy and reuse
Data-driven decision making
Data quality and collections bias
Preserving, resurrecting and restoring data
The uses and potential of historical library data
The intersection of catalogue data, AI and Large Language Models (LLMs)

This comprehensive book will be an essential read for practitioners in the GLAM sector, particularly those dealing with collections and catalogue data, and LIS academics and students…(More)”

Misinformation by Omission: The Need for More Environmental Transparency in AI

Curated on June 22, 2025June 22, 2025 by Stefaan Verhulst

Paper by Sasha Luccioni, Boris Gamazaychikov, Theo Alves da Costa, and Emma Strubell: “In recent years, Artificial Intelligence (AI) models have grown in size and complexity, driving greater demand for computational power and natural resources. In parallel to this trend, transparency around the costs and impacts of these models has decreased, meaning that the users of these technologies have little to no information about their resource demands and subsequent impacts on the environment. Despite this dearth of adequate data, escalating demand for figures quantifying AI’s environmental impacts has led to numerous instances of misinformation evolving from inaccurate or de-contextualized best-effort estimates of greenhouse gas emissions. In this article, we explore pervasive myths and misconceptions shaping public understanding of AI’s environmental impacts, tracing their origins and their spread in both the media and scientific publications. We discuss the importance of data transparency in clarifying misconceptions and mitigating these harms, and conclude with a set of recommendations for how AI developers and policymakers can leverage this information to mitigate negative impacts in the future…(More)”.