AI Summer, Data Winter: What the AI Index Reveals — and What It Doesn’t Yet Measure

Article by Stefaan Verhulst: “The AI Index Report 2026, released this week by Stanford HAI, offers a compelling portrait of what can only be described as an ongoing AI Summer. The indicators are striking: rapid adoption reaching more than half the population within three years, surging investment, near-human performance across multiple domains, and widespread deployment in science, medicine, and the economy. By nearly every conventional metric — capability, capital, and diffusion — AI is accelerating.

Yet, embedded within the report is a quieter but more consequential story: the deepening of a data winter. Nowhere is this more clearly articulated than in the report’s own section on the potential exhaustion of training data (page 25).

The report notes growing concern among leading researchers that we may be approaching “peak data”—a point at which access to high-quality human-generated text and web data is effectively exhausted. Some projections suggest that this depletion could occur as early as sometime between 2026 and 2032. This is not a marginal issue. Data exhaustion directly challenges the scaling paradigm that has underpinned AI’s recent breakthroughs. What appears as exponential growth in capability may, in fact, be approaching a structural ceiling–not due to limits in compute or model design, but due to constraints in data availability. In other words, the AI summer may be running on finite fuel.

The report further underscores that synthetic data — often proposed as a solution to data scarcity — has not yet proven to be a full substitute for real-world data, particularly in pre-training contexts. While hybrid approaches combining real and synthetic data can accelerate training, they do not surpass the performance of models trained on high-quality real data. Purely synthetic training, meanwhile, remains effective only in narrower or smaller-scale settings (e.g., for specialized RAG applications or sector-specific models). The implication is clear: the quality and diversity of real-world data remain irreplaceable at the frontier…(More)”.

How to contribute:

Did you come across – or create – a compelling project/report/book/app at the leading edge of innovation in governance?

Share it with us at info@thelivinglib.org so that we can add it to the Collection!

About the Curator

Stefaan Verhulst

Get the latest news right in your inbox

Subscribe to curated findings and actionable knowledge from The Living Library, delivered to your inbox every Friday

Artificial Intelligence

DATA

Explore our articles

AI Summer, Data Winter: What the AI Index Reveals — and What It Doesn’t Yet Measure

Share

How to contribute:

About the Curator

Stefaan Verhulst

Get the latest news right in your inbox

Related articles

Attribution in the Age of AI

Google Was a Lifeline for Publishers. Now Some Are Thinking of Cutting It Off.

Governing Well in the Algorithmic Age: The Foundations of Digital Statecraft