AI Scraping Bots Are Breaking Open Libraries, Archives, and Museums


Article by Emanuel Maiberg: “The report, titled “Are AI Bots Knocking Cultural Heritage Offline?” was written by Weinberg of the GLAM-E Lab, a joint initiative between the Centre for Science, Culture and the Law at the University of Exeter and the Engelberg Center on Innovation Law & Policy at NYU Law, which works with smaller cultural institutions and community organizations to build open access capacity and expertise. GLAM is an acronym for galleries, libraries, archives, and museums. The report is based on a survey of 43 institutions with open online resources and collections in Europe, North America, and Oceania. Respondents also shared data and analytics, and some followed up with individual interviews. The data is anonymized so institutions could share information more freely, and to prevent AI bot operators from undermining their counter measures.  

Of the 43 respondents, 39 said they had experienced a recent increase in traffic. Twenty-seven of those 39 attributed the increase in traffic to AI training data bots, with an additional seven saying the AI bots could be contributing to the increase. 

“Multiple respondents compared the behavior of the swarming bots to more traditional online behavior such as Distributed Denial of Service (DDoS) attacks designed to maliciously drive unsustainable levels of traffic to a server, effectively taking it offline,” the report said. “Like a DDoS incident, the swarms quickly overwhelm the collections, knocking servers offline and forcing administrators to scramble to implement countermeasures. As one respondent noted, ‘If they wanted us dead, we’d be dead.’”…(More)”

AI and Social Media: A Political Economy Perspective


Paper by Daron Acemoglu, Asuman Ozdaglar & James Siderius: “We consider the political consequences of the use of artificial intelligence (AI) by online platforms engaged in social media content dissemination, entertainment, or electronic commerce. We identify two distinct but complementary mechanisms, the social media channel and the digital ads channel, which together and separately contribute to the polarization of voters and consequently the polarization of parties. First, AI-driven recommendations aimed at maximizing user engagement on platforms create echo chambers (or “filter bubbles”) that increase the likelihood that individuals are not confronted with counter-attitudinal content. Consequently, social media engagement makes voters more polarized, and then parties respond by becoming more polarized themselves. Second, we show that party competition can encourage platforms to rely more on targeted digital ads for monetization (as opposed to a subscription-based business model), and such ads in turn make the electorate more polarized, further contributing to the polarization of parties. These effects do not arise when one party is dominant, in which case the profit-maximizing business model of the platform is subscription-based. We discuss the impact regulations can have on the polarizing effects of AI-powered online platforms…(More)”.

Introducing CC Signals: A New Social Contract for the Age of AI


Creative Commons: “Creative Commons (CC) today announces the public kickoff of the CC signals project, a new preference signals framework designed to increase reciprocity and sustain a creative commons in the age of AI. The development of CC signals represents a major step forward in building a more equitable, sustainable AI ecosystem rooted in shared benefits. This step is the culmination of years of consultation and analysis. As we enter this new phase of work, we are actively seeking input from the public. 

As artificial intelligence (AI) transforms how knowledge is created, shared, and reused, we are at a fork in the road that will define the future of access to knowledge and shared creativity. One path leads to data extraction and the erosion of openness; the other leads to a walled-off internet guarded by paywalls. CC signals offer another way, grounded in the nuanced values of the commons expressed by the collective.

Based on the same principles that gave rise to the CC licenses and tens of billions of works openly licensed online, CC signals will allow dataset holders to signal their preferences for how their content can be reused by machines based on a set of limited but meaningful options shaped in the public interest. They are both a technical and legal tool and a social proposition: a call for a new pact between those who share data and those who use it to train AI models.

“CC signals are designed to sustain the commons in the age of AI,” said Anna Tumadóttir, CEO, Creative Commons. “Just as the CC licenses helped build the open web, we believe CC signals will help shape an open AI ecosystem grounded in reciprocity.”

CC signals recognize that change requires systems-level coordination. They are tools that will be built for machine and human readability, and are flexible across legal, technical, and normative contexts. However, at their core CC signals are anchored in mobilizing the power of the collective. While CC signals may range in enforceability, legally binding in some cases and normative in others, their application will always carry ethical weight that says we give, we take, we give again, and we are all in this together.

Now Ready for Feedback 

More information about CC signals and early design decisions are available on the CC website. We are committed to developing CC signals transparently and alongside our partners and community. We are actively seeking public feedback and input over the next few months as we work toward an alpha launch in November 2025….(More)”

Robodebt: When automation fails


Article by Don Moynihan: “From 2016 to 2020, the Australian government operated an automated debt assessment and recovery system, known as “Robodebt,” to recover fraudulent or overpaid welfare benefits. The goal was to save $4.77 billion through debt recovery and reduced public service costs. However, the algorithm and policies at the heart of Robodebt caused wildly inaccurate assessments, and administrative burdens that disproportionately impacted those with the least resources. After a federal court ruled the policy unlawful, the government was forced to terminate Robodebt and agree to a $1.8 billion settlement.

Robodebt is important because it is an example of a costly failure with automation. By automation, I mean the use of data to create digital defaults for decisions. This could involve the use of AI, or it could mean the use of algorithms reading administrative data. Cases like Robodebt serve as canaries in the coalmine for policymakers interested in using AI or algorithms as an means to downsize public services on the hazy notion that automation will pick up the slack. But I think they are missing the very real risks involved.

To be clear, the lesson is not “all automation is bad.” Indeed, it offer real benefits in potentially reducing administrative costs and hassles and increasing access to public services (e.g. the use of automated or “ex parte” renewals for Medicaid, for example, which Republicans are considering limiting in their new budget bill). It is this promise that makes automation so attractive to policymakers. But it is also the case that automation can be used to deny access to services, and to put people into digital cages that are burdensome to escape from. This is why we need to learn from cases where it has been deployed.

The experience of Robodebt underlines the dangers of using citizens as lab rats to adopt AI on a broad scale before it is has been proven to work. Alongside the parallel collapse of the Dutch government childcare system, Robodebt provides an extraordinarily rich text to understand how automated decision processes can go wrong.

I recently wrote about Robodebt (with co-authors Morten Hybschmann, Kathryn Gimborys, Scott Loudin, Will McClellan), both in the journal of Perspectives on Public Management and Governance and as a teaching case study at the Better Government Lab...(More)”.

Practitioner perspectives on informing decisions in One Health sectors with predictive models


Paper by Kim M. Pepin: “Every decision a person makes is based on a model. A model is an idea about how a process works based on previous experience, observation, or other data. Models may not be explicit or stated (Johnson-Laird, 2010), but they serve to simplify a complex world. Models vary dramatically from conceptual (idea) to statistical (mathematical expression relating observed data to an assumed process and/or other data) or analytical/computational (quantitative algorithm describing a process). Predictive models of complex systems describe an understanding of how systems work, often in mathematical or statistical terms, using data, knowledge, and/or expert opinion. They provide means for predicting outcomes of interest, studying different management decision impacts, and quantifying decision risk and uncertainty (Berger et al. 2021; Li et al. 2017). They can help decision-makers assimilate how multiple pieces of information determine an outcome of interest about a complex system (Berger et al. 2021; Hemming et al. 2022).

People rely daily on system-level models to reach objectives. Choosing the fastest route to a destination is one example. Such a decision may be based on either a mental model of the road system developed from previous experience or a traffic prediction mapping application based on mathematical algorithms and current data. Either way, a system-level model has been applied and there is some uncertainty. In contrast, predicting outcomes for new and complex phenomena, such as emerging disease spread, a biological invasion risk (Chen et al. 2023; Elderd et al. 2006; Pepin et al. 2022), or climatic impacts on ecosystems is more uncertain. Here public service decision-makers may turn to mathematical models when expert opinion and experience do not resolve enough uncertainty about decision outcomes. But using models to guide decisions also relies on expert opinion and experience. Also, even technical experts need to make modeling choices regarding model structure and data inputs that have uncertainty (Elderd et al. 2006) and these might not be completely objective decisions (Bedson et al. 2021). Thus, using models for guiding decisions has subjectivity from both the developer and end-user, which can lead to apprehension or lack of trust about using models to inform decisions.

Models may be particularly advantageous to decision-making in One Health sectors, including health of humans, agriculture, wildlife, and the environment (hereafter called One Health sectors) and their interconnectedness (Adisasmito et al. 2022)…(More)”.

The Global A.I. Divide


Article by Adam Satariano and Paul Mozur: “Last month, Sam Altman, the chief executive of the artificial intelligence company OpenAI, donned a helmet, work boots and a luminescent high-visibility vest to visit the construction site of the company’s new data center project in Texas.

Bigger than New York’s Central Park, the estimated $60 billion project, which has its own natural gas plant, will be one of the most powerful computing hubs ever created when completed as soon as next year.

Around the same time as Mr. Altman’s visit to Texas, Nicolás Wolovick, a computer science professor at the National University of Córdoba in Argentina, was running what counts as one of his country’s most advanced A.I. computing hubs. It was in a converted room at the university, where wires snaked between aging A.I. chips and server computers.

“Everything is becoming more split,” Dr. Wolovick said. “We are losing.”

Artificial intelligence has created a new digital divide, fracturing the world between nations with the computing power for building cutting-edge A.I. systems and those without. The split is influencing geopolitics and global economics, creating new dependencies and prompting a desperate rush to not be excluded from a technology race that could reorder economies, drive scientific discovery and change the way that people live and work.

The biggest beneficiaries by far are the United States, China and the European Union. Those regions host more than half of the world’s most powerful data centers, which are used for developing the most complex A.I. systems, according to data compiled by Oxford University researchers. Only 32 countries, or about 16 percent of nations, have these large facilities filled with microchips and computers, giving them what is known in industry parlance as “compute power.”..(More)”.

Misinformation by Omission: The Need for More Environmental Transparency in AI


Paper by Sasha Luccioni, Boris Gamazaychikov, Theo Alves da Costa, and Emma Strubell: “In recent years, Artificial Intelligence (AI) models have grown in size and complexity, driving greater demand for computational power and natural resources. In parallel to this trend, transparency around the costs and impacts of these models has decreased, meaning that the users of these technologies have little to no information about their resource demands and subsequent impacts on the environment. Despite this dearth of adequate data, escalating demand for figures quantifying AI’s environmental impacts has led to numerous instances of misinformation evolving from inaccurate or de-contextualized best-effort estimates of greenhouse gas emissions. In this article, we explore pervasive myths and misconceptions shaping public understanding of AI’s environmental impacts, tracing their origins and their spread in both the media and scientific publications. We discuss the importance of data transparency in clarifying misconceptions and mitigating these harms, and conclude with a set of recommendations for how AI developers and policymakers can leverage this information to mitigate negative impacts in the future…(More)”.

ChatGPT Has Already Polluted the Internet So Badly That It’s Hobbling Future AI Development


Article by Frank Landymore: “The rapid rise of ChatGPT — and the cavalcade of competitors’ generative models that followed suit — has polluted the internet with so much useless slop that it’s already kneecapping the development of future AI models.

As the AI-generated data clouds the human creations that these models are so heavily dependent on amalgamating, it becomes inevitable that a greater share of what these so-called intelligences learn from and imitate is itself an ersatz AI creation. 

Repeat this process enough, and AI development begins to resemble a maximalist game of telephone in which not only is the quality of the content being produced diminished, resembling less and less what it’s originally supposed to be replacing, but in which the participants actively become stupider. The industry likes to describe this scenario as AI “model collapse.”

As a consequence, the finite amount of data predating ChatGPT’s rise becomes extremely valuable. In a new featureThe Register likens this to the demand for “low-background steel,” or steel that was produced before the detonation of the first nuclear bombs, starting in July 1945 with the US’s Trinity test. 

Just as the explosion of AI chatbots has irreversibly polluted the internet, so did the detonation of the atom bomb release radionuclides and other particulates that have seeped into virtually all steel produced thereafter. That makes modern metals unsuitable for use in some highly sensitive scientific and medical equipment. And so, what’s old is new: a major source of low-background steel, even today, is WW1 and WW2 era battleships, including a huge naval fleet that was scuttled by German Admiral Ludwig von Reuter in 1919…(More)”.

DeepSeek Inside: Origins, Technology, and Impact


Article by Michael A. Cusumano: “The release of DeepSeek V3 and R1 in January 2025 caused steep declines in the stock prices of companies that provide generative artificial intelligence (GenAI) infrastructure technology and datacenter services. These two large language models (LLMs) came from a little-known Chinese startup with approximately 200 employees versus at least 3,500 for industry-leader OpenAI. DeepSeek seemed to have developed this powerful technology much more cheaply than previously thought possible. If true, DeepSeek had the potential to disrupt the economics of the entire GenAI ecosystem and the dominance of U.S. companies ranging from OpenAI to Nvidia.

DeepSeek-R1 defines itself as “an artificial intelligence language model developed by OpenAI, specifically based on the generative pre-trained transformer (GPT) architecture.” Here, DeepSeek acknowledges that the transformer researchers (who published their landmark paper while at Google in 2017) and OpenAI developed its basic technology. Nonetheless, V3 and R1 display impressive skills in neural-network system design, engineering, and optimization, and DeepSeek’s publications provide rare insights into how the technology actually works. This column reviews, for the non-expert reader, what we know about DeepSeek’s origins, technology, and impact so far…(More)”.

AI is supercharging war. Could it also help broker peace?


Article by Tina Amirtha: “Can we measure what is in our hearts and minds, and could it help us end wars any sooner? These are the questions that consume entrepreneur Shawn Guttman, a Canadian émigré who recently gave up his yearslong teaching position in Israel to accelerate a path to peace—using an algorithm.

Living some 75 miles north of Tel Aviv, Guttman is no stranger to the uncertainties of conflict. Over the past few months, miscalculated drone strikes and imprecise missile targets—some intended for larger cities—have occasionally landed dangerously close to his town, sending him to bomb shelters more than once.

“When something big happens, we can point to it and say, ‘Right, that happened because five years ago we did A, B, and C, and look at its effect,’” he says over Google Meet from his office, following a recent trip to the shelter. Behind him, souvenirs from the 1979 Egypt-Israel and 1994 Israel-Jordan peace treaties are visible. “I’m tired of that perspective.”

The startup he cofounded, Didi, is taking a different approach. Its aim is to analyze data across news outlets, political discourse, and social media to identify opportune moments to broker peace. Inspired by political scientist I. William Zartman’s “ripeness” theory, the algorithm—called the Ripeness Index—is designed to tell negotiators, organizers, diplomats, and nongovernmental organizations (NGOs) exactly when conditions are “ripe” to initiate peace negotiations, build coalitions, or launch grassroots campaigns.

During ongoing U.S.-led negotiations over the war in Gaza, both Israel and Hamas have entrenched themselves in opposing bargaining positions. Meanwhile, Israel’s traditional allies, including the U.S., have expressed growing frustration over the war and the dire humanitarian conditions in the enclave, where the threat of famine looms.

In Israel, Didi’s data is already informing grassroots organizations as they strategize which media outlets to target and how to time public actions, such as protests, in coordination with coalition partners. Guttman and his collaborators hope that eventually negotiators will use the model’s insights to help broker lasting peace.

Guttman’s project is part of a rising wave of so-called PeaceTech—a movement using technology to make negotiations more inclusive and data-driven. This includes AI from Hala Systems, which uses satellite imagery and data fusion to monitor ceasefires in Yemen and Ukraine. Another AI startup, Remesh, has been active across the Middle East, helping organizations of all sizes canvas key stakeholders. Its algorithm clusters similar opinions, giving policymakers and mediators a clearer view of public sentiment and division.

A range of NGOs and academic researchers have also developed digital tools for peacebuilding. The nonprofit Computational Democracy Project created Pol.is, an open-source platform that enables citizens to crowdsource outcomes to public debates. Meanwhile, the Futures Lab at the Center for Strategic and International Studies built a peace agreement simulator, complete with a chart to track how well each stakeholder’s needs are met.

Guttman knows it’s an uphill battle. In addition to the ethical and privacy concerns of using AI to interpret public sentiment, PeaceTech also faces financial hurdles. These companies must find ways to sustain themselves amid shrinking public funding and a transatlantic surge in defense spending, which has pulled resources away from peacebuilding initiatives.

Still, Guttman and his investors remain undeterred. One way to view the opportunity for PeaceTech is by looking at the economic toll of war. In its Global Peace Index 2024, the Institute for Economics and Peace’s Vision of Humanity platform estimated that economic disruption due to violence and the fear of violence cost the world $19.1 trillion in 2023, or about 13 percent of global GDP. Guttman sees plenty of commercial potential in times of peace as well.

“Can we make billions of dollars,” Guttman asks, “and save the world—and create peace?” ..(More)”….See also Kluz Prize for PeaceTech (Applications Open)