AI Scraping Bots Are Breaking Open Libraries, Archives, and Museums


Article by Emanuel Maiberg: “The report, titled “Are AI Bots Knocking Cultural Heritage Offline?” was written by Weinberg of the GLAM-E Lab, a joint initiative between the Centre for Science, Culture and the Law at the University of Exeter and the Engelberg Center on Innovation Law & Policy at NYU Law, which works with smaller cultural institutions and community organizations to build open access capacity and expertise. GLAM is an acronym for galleries, libraries, archives, and museums. The report is based on a survey of 43 institutions with open online resources and collections in Europe, North America, and Oceania. Respondents also shared data and analytics, and some followed up with individual interviews. The data is anonymized so institutions could share information more freely, and to prevent AI bot operators from undermining their countermeasures.  

Do you know anything else about AI scrapers? I would love to hear from you. Using a non-work device, you can message me securely on Signal at ‪@emanuel.404‬. Otherwise, send me an email at [email protected].

Of the 43 respondents, 39 said they had experienced a recent increase in traffic. Twenty-seven of those 39 attributed the increase in traffic to AI training data bots, with an additional seven saying the AI bots could be contributing to the increase. 

“Multiple respondents compared the behavior of the swarming bots to more traditional online behavior such as Distributed Denial of Service (DDoS) attacks designed to maliciously drive unsustainable levels of traffic to a server, effectively taking it offline,” the report said. “Like a DDoS incident, the swarms quickly overwhelm the collections, knocking servers offline and forcing administrators to scramble to implement countermeasures. As one respondent noted, ‘If they wanted us dead, we’d be dead.’”..(More)”

The Global A.I. Divide


Article by Adam Satariano and Paul Mozur: “Last month, Sam Altman, the chief executive of the artificial intelligence company OpenAI, donned a helmet, work boots and a luminescent high-visibility vest to visit the construction site of the company’s new data center project in Texas.

Bigger than New York’s Central Park, the estimated $60 billion project, which has its own natural gas plant, will be one of the most powerful computing hubs ever created when completed as soon as next year.

Around the same time as Mr. Altman’s visit to Texas, Nicolás Wolovick, a computer science professor at the National University of Córdoba in Argentina, was running what counts as one of his country’s most advanced A.I. computing hubs. It was in a converted room at the university, where wires snaked between aging A.I. chips and server computers.

“Everything is becoming more split,” Dr. Wolovick said. “We are losing.”

Artificial intelligence has created a new digital divide, fracturing the world between nations with the computing power for building cutting-edge A.I. systems and those without. The split is influencing geopolitics and global economics, creating new dependencies and prompting a desperate rush to not be excluded from a technology race that could reorder economies, drive scientific discovery and change the way that people live and work.

The biggest beneficiaries by far are the United States, China and the European Union. Those regions host more than half of the world’s most powerful data centers, which are used for developing the most complex A.I. systems, according to data compiled by Oxford University researchers. Only 32 countries, or about 16 percent of nations, have these large facilities filled with microchips and computers, giving them what is known in industry parlance as “compute power.”..(More)”.

The war over the peace business


Article by Tekendra Parmar: “At the second annual AI+ Expo in Washington, DC, in early June, war is the word of the day.

As a mix of Beltway bureaucrats, military personnel, and Washington’s consultant class peruse the expansive Walter E. Washington Convention Center, a Palantir booth showcases its latest in data-collection suites for “warfighters.” Lockheed Martin touts the many ways it is implementing AI throughout its weaponry systems. On the soundstage, the defense tech darling Mach Industries is selling its newest uncrewed aerial vehicles. “We’re living in a world with great-power competition,” the presenter says. “We can’t rule out the possibility of war — but the best way to prevent a war is deterrence,” he says, flanked by videos of drones flying through what looked like the rugged mountains and valleys of Kandahar.

Hosted by the Special Competitive Studies Project, a think tank led by former Google CEO Eric Schmidt, the expo says it seeks to bridge the gap between Silicon Valley entrepreneurs and Washington policymakers to “strengthen” America and its allies’ “competitiveness in critical technologies.”

One floor below, a startup called Anadyr Horizon is making a very different sales pitch, for software that seeks to prevent war rather than fight it: “Peace tech,” as the company’s cofounder Arvid Bell calls it. Dressed in white khakis and a black pinstripe suit jacket with a dove and olive branch pinned to his lapel (a gift from his husband), the former Harvard political scientist begins by noting that Russia’s all-out invasion of Ukraine had come as a surprise to many political scientists. But his AI software, he says, could predict it.

Long the domain of fantasy and science fiction, the idea of forecasting conflict has now become a serious pursuit. In Isaac Asimov’s 1950s “Foundation” series, the main character develops an algorithm that allows him to predict the decline of the Galactic Empire, angering its rulers and forcing him into exile. During the coronavirus pandemic, the US State Department experimented with AI fed with Twitter data to predict “COVID cases” and “violent events.” In its AI audit two years ago, the State Department revealed that it started training AI on “open-source political, social, and economic datasets” to predict “mass civilian killings.” The UN is also said to have experimented with AI to model the war in Gaza…(More)”… ..See also Kluz Prize for PeaceTech (Applications Open)

Fixing the US statistical infrastructure


Article by Nancy Potok and Erica L. Groshen: “Official government statistics are critical infrastructure for the information age. Reliable, relevant, statistical information helps businesses to invest and flourish; governments at the local, state, and national levels to make critical decisions on policy and public services; and individuals and families to invest in their futures. Yet surrounded by all manner of digitized data, one can still feel inadequately informed. A major driver of this disconnect in the US context is delayed modernization of the federal statistical system. The disconnect will likely worsen in coming months as the administration shrinks statistical agencies’ staffing, terminates programs (notably for health and education statistics), and eliminates unpaid external advisory groups. Amid this upheaval, might the administration’s appetite for disruption be harnessed to modernize federal statistics?

Federal statistics, one of the United States’ premier public goods, differ from privately provided data because they are privacy protected, aggregated to address relevant questions for decision-makers, constructed transparently, and widely available without a subscription. The private sector cannot be expected to adequately supply such statistical infrastructure. Yes, some companies collect and aggregate some economic data, such as credit card purchases and payroll information. But without strong underpinnings of a modern, federal information infrastructure, there would be large gaps in nationally consistent, transparent, trustworthy data. Furthermore, most private providers rely on public statistics for their internal analytics, to improve their products. They are among the many data users asking for more from statistical agencies…(More)”.

A New Paradigm for Fueling AI for the Public Good


Article by Kevin T. Frazier: “Imagine receiving this email in the near future: “Thank you for sharing data with the American Data Collective on May 22, 2025. After first sharing your workout data with SprintAI, a local startup focused on designing shoes for differently abled athletes, your data donation was also sent to an artificial intelligence research cluster hosted by a regional university. Your donation is on its way to accelerate artificial intelligence innovation and support researchers and innovators addressing pressing public needs!”

That is exactly the sort of message you could expect to receive if we made donations of personal data akin to blood donations—a pro-social behavior that may not immediately serve a donor’s individual needs but may nevertheless benefit the whole of the community. This vision of a future where data flow toward the public good is not science fiction—it is a tangible possibility if we address a critical bottleneck faced by innovators today.

Creating the data equivalent of blood banks may not seem like a pressing need or something that people should voluntarily contribute to, given widespread concerns about a few large artificial intelligence (AI) companies using data for profit-driven and, arguably, socially harmful ends. This narrow conception of the AI ecosystem fails to consider the hundreds of AI research initiatives and startups that have a desperate need for high-quality data. I was fortunate enough to meet leaders of those nascent AI efforts at Meta’s Open Source AI Summit in Austin, Texas. For example, I met with Matt Schwartz, who leads a startup that leans on AI to glean more diagnostic information from colonoscopies. I also connected with Edward Chang, a professor of neurological surgery at the University of California, San Francisco Weill Institute for Neurosciences, who relies on AI tools to discover new information on how and why our brains work. I also got to know Corin Wagen, whose startup is helping companies “find better molecules faster.” This is a small sample of the people leveraging AI for objectively good outcomes. They need your help. More specifically, they need your data.

A tragic irony shapes our current data infrastructure. Most of us share mountains of data with massive and profitable private parties—smartwatch companies, diet apps, game developers, and social media companies. Yet, AI labs, academic researchers, and public interest organizations best positioned to leverage our data for the common good are often those facing the most formidable barriers to acquiring the necessary quantity, quality, and diversity of data. Unlike OpenAI, they are not going to use bots to scrape the internet for data. Unlike Google and Meta, they cannot rely on their own social media platforms and search engines to act as perpetual data generators. And, unlike Anthropic, they lack the funds to license data from media outlets. So, while commercial entities amass vast datasets, frequently as a byproduct of consumer services and proprietary data acquisition strategies, mission-driven AI initiatives dedicated to public problems find themselves in a state of chronic data scarcity. This is not merely a hurdle—it is a systemic bottleneck choking off innovation where society needs it most, delaying or even preventing the development of AI tools that could significantly improve lives.

Individuals are, quite rightly, increasingly hesitant to share their personal information, with concerns about privacy, security, and potential misuse being both rampant and frequently justified by past breaches and opaque practices. Yet, in a striking contradiction, troves of deeply personal data are continuously siphoned by app developers, by tech platforms, and, often opaquely, by an extensive network of data brokers. This practice often occurs with minimal transparency and without informed consent concerning the full lifecycle and downstream uses of that data. This lack of transparency extends to how algorithms trained on this data make decisions that can impact individuals’ lives—from loan applications to job prospects—often without clear avenues for recourse or understanding, potentially perpetuating existing societal biases embedded in historical data…(More)”.

Sentinel Cities for Public Health


Article by Jesse Rothman, Paromita Hore & Andrew McCartor: “In 2017, a New York City health inspector visited the home of a 5-year-old child with an elevated blood lead level. With no sign of lead paint—the usual suspect in such cases—the inspector discovered dangerous levels of lead in a bright yellow container of “Georgian Saffron,” a spice obtained in the family’s home country. It was not the first case associated with the use of lead-containing Georgian spices—the NYC Health Department shared their findings with authorities in Georgia, which catalyzed a survey of children’s blood lead levels in Georgia, and led to increased regulatory enforcement and education. Significant declines in spice lead levels in the country have had ripple effects in NYC also: not only a drop in spice samples from Georgia containing detectable lead but also a significant reduction in blood lead levels among NYC children of Georgian ancestry.

This wasn’t a lucky break—it was the result of a systematic approach to transform local detection into global impact. Findings from local NYC surveillance are, of course, not limited to Georgian spices. Surveillance activities have identified a variety of lead-containing consumer products from around the world, from cosmetics and medicines to ceramics and other goods. Routinely surveying local stores for lead-containing products has resulted in the removal of over 30,000 hazardous consumer products from NYC store shelves since 2010.

How can we replicate and scale up NYC’s model to address the global crisis of lead poisoning?…(More)”.

The Loyalty Trap


Book by Jaime Lee Kucinskas: “…explores how civil servants navigated competing pressures and duties amid the chaos of the Trump administration, drawing on in-depth interviews with senior officials in the most contested agencies over the course of a tumultuous term. Jaime Lee Kucinskas argues that the professional culture and ethical obligations of the civil service stabilize the state in normal times but insufficiently prepare bureaucrats to cope with a president like Trump. Instead, federal employees became ensnared in intractable ethical traps, caught between their commitment to nonpartisan public service and the expectation of compliance with political directives. Kucinskas shares their quandaries, recounting attempts to preserve the integrity of government agencies, covert resistance, and a few bold acts of moral courage in the face of organizational decline and politicized leadership. A nuanced sociological account of the lessons of the Trump administration for democratic governance, The Loyalty Trap offers a timely and bracing portrait of the fragility of the American state…(More)”.

Participatory Approaches to Responsible Data Reuse and Establishing a Social License


Chapter by Stefaan Verhulst, Andrew J. Zahuranec & Adam Zable in Global Public Goods Communication (edited by Sónia Pedro Sebastião and Anne-Marie Cotton): “… examines innovative participatory processes for establishing a social license for reusing data as a global public good. While data reuse creates societal value, it can raise concerns and reinforce power imbalances when individuals and communities lack agency over how their data is reused. To address this, the chapter explores participatory approaches that go beyond traditional consent mechanisms. By engaging data subjects and stakeholders, these approaches aim to build trust and ensure data reuse benefits all parties involved.

The chapter presents case studies of participatory approaches to data reuse from various sectors. This includes The GovLab’s New York City “Data Assembly,” which engaged citizens to set conditions for reusing cell phone data during the COVID-19 response. These examples highlight both the potential and challenges of citizen engagement, such as the need to invest in data literacy and other resources to support meaningful public input. The chapter concludes by considering whether participatory processes for data reuse can foster digital self-determination…(More)”.

Data Integration, Sharing, and Management for Transportation Planning and Traffic Operations


Report by the National Academies of Sciences, Engineering, and Medicine: “Planning and operating transportation systems involves the exchange of large volumes of data that must be shared between partnering transportation agencies, private-sector interests, travelers, and intelligent devices such as traffic signals, ramp meters, and connected vehicles.

NCHRP Research Report 1121: Data Integration, Sharing, and Management for Transportation Planning and Traffic Operations, from TRB’s National Cooperative Highway Research Program, presents tools, methods, and guidelines for improving data integration, sharing, and management practices through case studies, proof-of-concept product developments, and deployment assistance…(More)”.

Can AI Agents Be Trusted?


Article by Blair Levin and Larry Downes: “Agentic AI has quickly become one of the most active areas of artificial intelligence development. AI agents are a level of programming on top of large language models (LLMs) that allow them to work towards specific goals. This extra layer of software can collect data, make decisions, take action, and adapt its behavior based on results. Agents can interact with other systems, apply reasoning, and work according to priorities and rules set by you as the principal.

Companies such as Salesforce have already deployed agents that can independently handle customer queries in a wide range of industries and applications, for example, and recognize when human intervention is required.

But perhaps the most exciting future for agentic AI will come in the form of personal agents, which can take self-directed action on your behalf. These agents will act as your personal assistant, handling calendar management, performing directed research and analysis, finding, negotiating for, and purchasing goods and services, curating content and taking over basic communications, learning and optimizing themselves along the way.

The idea of personal AI agents goes back decades, but the technology finally appears ready for prime-time. Already, leading companies are offering prototype personal AI agents to their customers, suppliers, and other stakeholders, raising challenging business and technical questions. Most pointedly: Can AI agents be trusted to act in our best interests? Will they work exclusively for us, or will their loyalty be split between users, developers, advertisers, and service providers? And how will be know?

The answers to these questions will determine whether and how quickly users embrace personal AI agents, and if their widespread deployment will enhance or damage business relationships and brand value…(More)”.