AI crawler wars threaten to make the web more closed for everyone


Article by Shayne Longpre: “We often take the internet for granted. It’s an ocean of information at our fingertips—and it simply works. But this system relies on swarms of “crawlers”—bots that roam the web, visit millions of websites every day, and report what they see. This is how Google powers its search engines, how Amazon sets competitive prices, and how Kayak aggregates travel listings. Beyond the world of commerce, crawlers are essential for monitoring web security, enabling accessibility tools, and preserving historical archives. Academics, journalists, and civil societies also rely on them to conduct crucial investigative research.  

Crawlers are endemic. Now representing half of all internet traffic, they will soon outpace human traffic. This unseen subway of the web ferries information from site to site, day and night. And as of late, they serve one more purpose: Companies such as OpenAI use web-crawled data to train their artificial intelligence systems, like ChatGPT. 

Understandably, websites are now fighting back for fear that this invasive species—AI crawlers—will help displace them. But there’s a problem: This pushback is also threatening the transparency and open borders of the web, that allow non-AI applications to flourish. Unless we are thoughtful about how we fix this, the web will increasingly be fortified with logins, paywalls, and access tolls that inhibit not just AI but the biodiversity of real users and useful crawlers…(More)”.

How Philanthropy Built, Lost, and Could Reclaim the A.I. Race


Article by Sara Herschander: “How do we know you won’t pull an OpenAI?”

It’s the question Stella Biderman has gotten used to answering when she seeks funding from major foundations for EleutherAI, her two-year-old nonprofit A.I. lab that has developed open-source artificial intelligence models.

The irony isn’t lost on her. Not long ago, she declined a deal dangled by one of Silicon Valley’s most prominent venture capitalists who, with the snap of his fingers, promised to raise $100 million for the fledgling nonprofit lab — over 30 times EleutherAI’s current annual budget — if only the lab’s leaders would agree to drop its 501(c)(3) status.

In today’s A.I. gold rush, where tech giants spend billions on increasingly powerful models and top researchers command seven-figure salaries, to be a nonprofit A.I. lab is to be caught in a Catch-22: defend your mission to increasingly wary philanthropic funders or give in to temptation and become a for-profit company.

Philanthropy once played an outsize role in building major A.I. research centers and nurturing influential theorists — by donating hundreds of millions of dollars, largely to university labs — yet today those dollars are dwarfed by the billions flowing from corporations and venture capitalists. For tech nonprofits and their philanthropic backers, this has meant embracing a new role: pioneering the research and safeguards the corporate world won’t touch.

“If making a lot of money was my goal, that would be easy,” said Biderman, whose employees have seen their pay packages triple or quadruple after being poached by companies like OpenAI, Anthropic, and Google.

But EleutherAI doesn’t want to join the race to build ever-larger models. Instead, backed by grants from Open Philanthropy, Omidyar Network, and A.I. companies Hugging Face and StabilityAI, the group has carved out a different niche: researching how A.I. systems make decisions, maintaining widely used training datasets, and shaping global policy around A.I. safety and transparency…(More)”.

Google-backed public interest AI partnership launches with $400M+ for open ecosystem building


Article by Natasha Lomas: “Make room for yet another partnership on AI. Current AI, a “public interest” initiative focused on fostering and steering development of artificial intelligence in societally beneficial directions, was announced at the French AI Action summit on Monday. It’s kicking off with an initial $400 million in pledges from backers and a plan to pull in $2.5 billion more over the next five years.

Such figures might are small beer when it comes to AI investment, with the French president fresh from trumpeting a private support package worth around $112 billion (which itself pales beside U.S. investments of $500 billion aiming to accelerate the tech). But the partnership is not focused on compute, so its backers believe such relatively modest sums will still be able to produce an impact in key areas where AI could make a critical difference to advancing the public interest in areas like healthcare and supporting climate goals.

The initial details are high level. Under the top-line focus on “the enabling environment for public interest AI,” the initiative has a number of stated aims — including pushing to widen access to “high quality” public and private datasets for AI training; support for open source infrastructure and tooling to boost AI transparency and security; and support for developing systems to measure AI’s social and environmental impact. 

Its founder, Martin Tisné, said the goal is to create a financial vehicle “to provide a North Star for public financing of critical efforts,” such as bringing AI to bear on combating cancers or coming up with treatments for long COVID.

“I think what’s happening is you’ve got a data bottleneck coming in artificial intelligence, because we’re running out of road with data on the web, effectively … and here, what we need is to really unlock innovations in how to make data accessible and available,” he told TechCrunch….(More)”

Trump’s shocking purge of public health data, explained


Article by Dylan Scott: “In the initial days of the Trump administration, officials scoured federal websites for any mention of what they deemed “DEI” keywords — terms as generic as “diverse” and “historically” and even “women.” They soon identified reams of some of the country’s most valuable public health data containing some of the targeted words, including language about LGBTQ+ people, and quickly took down much of it — from surveys on obesity and suicide rates to real-time reports on immediate infectious disease threats like bird flu.

The removal elicited a swift response from public health experts who warned that without this data, the country risked being in the dark about important health trends that shape life-and-death public health decisions made in communities across the country.

Some of this data was restored in a matter of days, but much of it was incomplete. In some cases, the raw data sheets were posted again, but the reference documents that would allow most people to decipher them were not. Meanwhile, health data continues to be taken down: The New York Times reported last week that data from the Centers for Disease Control and Prevention on bird flu transmission between humans and cats had been posted and then promptly removed…

It is difficult to capture the sheer breadth and importance of the public health data that has been affected. Here are a few illustrative examples of reports that have either been tampered with or removed completely, as compiled by KFF.

The Behavioral Risk Factor Surveillance System (BRFSS), which is “one of the most widely used national health surveys and has been ongoing for about 40 years,” per KFF, is an annual survey that contacts 400,000 Americans to ask people about everything from their own perception of their general health to exercise, diet, sexual activity, and alcohol and drug use.

That in turn allows experts to track important health trends, like the fluctuations in teen vaping use. One recent study that relied on BRFSS data warned that a recent ban on flavored e-cigarettes (also known as vapes) may be driving more young people to conventional smoking, five years after an earlier Yale study based on the same survey led to the ban being proposed in the first place. The Supreme Court and the Trump administration are currently revisiting the flavored vape ban, and the Yale study was cited in at least one amicus brief for the case.

This survey has also been of particular use in identifying health disparities among LGBTQ+ people, such as higher rates of uninsurance and reported poor health compared to the general population. Those findings have motivated policymakers at the federal, state and local levels to launch new initiatives aimed specifically at that at-risk population.

As of now, most of the BRFSS data has been restored, but the supplemental materials that make it legible to lay people still has not…(More)”.

How the System Works


Article by Charles C. Mann: “…We, too, do not have the luxury of ignorance. Our systems serve us well for the most part. But they will need to be revamped for and by the next generation — the generation of the young people at the rehearsal dinner — to accommodate our rising population, technological progress, increasing affluence, and climate change.

The great European cathedrals were built over generations by thousands of people and sustained entire communities. Similarly, the electric grid, the public-water supply, the food-distribution network, and the public-health system took the collective labor of thousands of people over many decades. They are the cathedrals of our secular era. They are high among the great accomplishments of our civilization. But they don’t inspire bestselling novels or blockbuster films. No poets celebrate the sewage treatment plants that prevent them from dying of dysentery. Like almost everyone else, they rarely note the existence of the systems around them, let alone understand how they work…(More)”.

Call to make tech firms report data centre energy use as AI booms


Article by Sandra Laville: “Tech companies should be required by law to report the energy and water consumption for their data centres, as the boom in AI risks causing irreparable damage to the environment, experts have said.

AI is growing at a rate unparalleled by other energy systems, bringing heightened environmental risk, a report by the National Engineering Policy Centre (NEPC) said.

The report calls for the UK government to make tech companies submit mandatory reports on their energy and water consumption and carbon emissions in order to set conditions in which data centres are designed to use fewer vital resources…(More)”.

Tech tycoons have got the economics of AI wrong


The Economist: “…The Jevons paradox—the idea that efficiency leads to more use of a resource, not less—has in recent days provided comfort to Silicon Valley titans worried about the impact of DeepSeek, the maker of a cheap and efficient Chinese chatbot, which threatens the more powerful but energy-guzzling American varieties. Satya Nadella, the boss of Microsoft, posted on X, a social-media platform, that “Jevons paradox strikes again! As AI gets more efficient and accessible, we will see its use skyrocket, turning it into a commodity we just can’t get enough of,” along with a link to the Wikipedia page for the economic principle. Under this logic, DeepSeek’s progress will mean more demand for data centres, Nvidia chips and even the nuclear reactors that the hyperscalers were, prior to the unveiling of DeepSeek, paying to restart. Nothing to worry about if the price falls, Microsoft can make it up on volume.

The logic, however self-serving, has a ring of truth to it. Jevons’s paradox is real and observable in a range of other markets. Consider the example of lighting. William Nordhaus, a Nobel-prizewinning economist, has calculated that a Babylonian oil lamp, powered by sesame oil, produced about 0.06 lumens of light per watt of energy. That compares with up to 110 lumens for a modern light-emitting diode. The world has not responded to this dramatic improvement in energy efficiency by enjoying the same amount of light as a Babylonian at lower cost. Instead, it has banished darkness completely, whether through more bedroom lamps than could have been imagined in ancient Mesopotamia or the Las Vegas sphere, which provides passersby with the chance to see a 112-metre-tall incandescent emoji. Urban light is now so cheap and so abundant that many consider it to be a pollutant.

Likewise, more efficient chatbots could mean that AI finds new uses (some no doubt similarly obnoxious). The ability of DeepSeek’s model to perform about as well as more compute-hungry American AI shows that data centres are more productive than previously thought, rather than less. Expect, the logic goes, more investment in data centres and so on than you did before.

Although this idea should provide tech tycoons with some solace, they still ought to worry. The Jevons paradox is a form of a broader phenomenon known as “rebound effects”. These are typically not large enough to fully offset savings from improved efficiency….Basing the bull case for AI on the Jevons paradox is, therefore, a bet not on the efficiency of the technology but on the level of demand. If adoption is being held back by price then efficiency gains will indeed lead to greater use. If technological progress raises expectations rather than reduces costs, as is typical in health care, then chatbots will make up an ever larger proportion of spending. At the moment, that looks unlikely. America’s Census Bureau finds that only 5% of American firms currently use AI and 7% have plans to adopt it in the future. Many others find the tech difficult to use or irrelevant to their line of business…(More)”.

Unlocking AI’s potential for the public sector


Article by Ruth Kelly: “…Government needs to work on its digital foundations. The extent of legacy IT systems across government is huge. Many were designed and built for a previous business era, and still rely on paper-based processes. Historic neglect and a lack of asset maintenance has added to the difficulty. Because many systems are not compatible, sharing data across systems requires manual extraction which is risky and costly. All this adds to problems with data quality. Government suffers from data which is incomplete, inconsistent, inaccessible, difficult to process and not easily shareable. A lack of common data models, both between and within government departments, makes it difficult and costly to combine different sources of data, and significant manual effort is required to make data usable. Some departments have told us that they spend 60% to 80% of their time on cleaning data when carrying out analysis.

Why is this an issue for AI? Large volumes of good-quality data are important for training, testing and deploying AI models. Poor data leads to poor outcomes, especially where it involves personal data. Access to good-quality data was identified as a barrier to implementing AI by 62% of the 87 government bodies responding to our survey. Simple productivity improvements that provide integration with routine administration (for example summarising documents) is already possible, but integration with big, established legacy IT is a whole other long-term endeavour. Layering new technology on top of existing systems, and reusing poor-quality and aging data, carries the risk of magnifying problems and further embedding reliance on legacy systems…(More)”

The 2026 Aid Transparency Index is canceled. Here’s what it means


Article by Gary Forster: “As things stand, we will not be running the 2026 Aid Transparency Index. Not because it isn’t needed. Not because it isn’t effective. But because, in spite of our best efforts, we haven’t been able to secure the funding for it.

This is not a trivial loss. The Aid Transparency Index has been the single most powerful mechanism driving improvements in the quantity and quality of aid data that is published to the International Aid Transparency Initiative, or IATI, Standard. Since 2012, every two years, it has independently assessed and ranked the transparency of the world’s 50 largest aid agencies — organizations responsible for 92% of all spending published in IATI, amounting to $237 billion in 2023 alone.

The index works because it shapes agency behavior. It has encouraged reluctant agencies to start publishing data; motivated those already engaged to improve data quantity and quality; and provided a crucial, independent check on the state of global aid transparency…(More)”.

Is This How Reddit Ends?


Article by Matteo Wong: “The internet is growing more hostile to humans. Google results are stuffed with search-optimized spam, unhelpful advertisements, and AI slop. Amazon has become littered with undifferentiated junk. The state of social media, meanwhile—fractured, disorienting, and prone to boosting all manner of misinformation—can be succinctly described as a cesspool.

It’s with some irony, then, that Reddit has become a reservoir of humanity. The platform has itself been called a cesspool, rife with hateful rhetoric and falsehoods. But it is also known for quirky discussions and impassioned debates on any topic among its users. Does charging your brother rent, telling your mom she’s an unwanted guest, or giving your wife a performance review make you an asshole? (Redditors voted no, yes, and “everyone sucks,” respectively.) The site is where fans hash out the best rap album ever and plumbers weigh in on how to unclog a drain. As Google has begun to offer more and more vacuous SEO sites and ads in response to queries, many people have started adding reddit to their searches to find thoughtful, human-written answers: find mosquito in bedroom redditfix musty sponge reddit.

But now even Reddit is becoming more artificial. The platform has quietly started beta-testing Reddit Answers, what it calls an “AI-powered conversational interface.” In function and design, the feature—which is so far available only for some users in the U.S.—is basically an AI chatbot. On a new search screen accessible from the homepage, Reddit Answers takes anyone’s queries, trawls the site for relevant discussions and debates, and composes them into a response. In other words, a site that sells itself as a home for “authentic human connection” is now giving humans the option to interact with an algorithm instead…(More)”.