Explore our articles
View All Results

Stefaan Verhulst

Paper by Parshin Shojaee, Iman Mirzadeh, Keivan Alizadeh, Maxwell Horton, Samy Bengio, and Mehrdad Farajtabar: “Recent generations of frontier language models have introduced Large Reasoning Models (LRMs) that generate detailed thinking processes before providing answers. While these models demonstrate improved performance on reasoning benchmarks, their fundamental capabilities, scaling properties, and limitations remain insufficiently understood. Current evaluations primarily focus on established mathematical and coding benchmarks, emphasizing final answer accuracy. However, this evaluation paradigm often suffers from data contamination and does not provide insights into the reasoning traces’ structure and quality. In this work, we systematically investigate these gaps with the help of controllable puzzle environments that allow precise manipulation of compositional complexity while maintaining consistent logical structures. This setup enables the analysis of not only final answers but also the internal reasoning traces, offering insights into how LRMs “think”. Through extensive experimentation across diverse puzzles, we show that frontier LRMs face a complete accuracy collapse beyond certain complexities. Moreover, they exhibit a counter- intuitive scaling limit: their reasoning effort increases with problem complexity up to a point, then declines despite having an adequate token budget. By comparing LRMs with their standard LLM counterparts under equivalent inference compute, we identify three performance regimes: (1) low- complexity tasks where standard models surprisingly outperform LRMs, (2) medium-complexity tasks where additional thinking in LRMs demonstrates advantage, and (3) high-complexity tasks where both models experience complete collapse. We found that LRMs have limitations in exact computation: they fail to use explicit algorithms and reason inconsistently across puzzles. We also investigate the reasoning traces in more depth, studying the patterns of explored solutions and analyzing the models’ computational behavior, shedding light on their strengths, limitations, and ultimately raising crucial questions about their true reasoning capabilities…(More)”

The Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models via the Lens of Problem Complexity

Article by Keongmin Yoon, Olivier Dupriez, Bryan Cahill, and Katie Bannon: “The World Bank has long championed data transparency. Open data platforms, global indicators, and reproducible research have become pillars of the Bank’s knowledge work. But in many operational contexts, access to raw data alone is not enough. Turning data into insight requires tools—software to structure metadata, run models, update systems, and integrate outputs into national platforms.

With this in mind, the World Bank has released its first Open Source Software (OSS) tool under a new institutional licensing framework. The Metadata Editor—a lightweight application for structuring and publishing statistical metadata—is now publicly available on the Bank’s GitHub repository, under the widely used MIT License, supplemented by Bank-specific legal provisions.

This release marks more than a technical milestone. It reflects a structural shift in how the Bank shares its data and knowledge. For the first time, there is a clear institutional framework for making Bank-developed software open, reusable, and legally shareable—advancing the Bank’s commitment to public goods, transparency, Open Science, and long-term development impact, as emphasized in The Knowledge Compact for Action…(More)”.

Opening code, opening access: The World Bank’s first open source software release

Editorial in Nature: “Coinciding with US President Donald Trump’s tour of Gulf states last week, Saudi Arabia announced that it is embarking on a large-scale artificial intelligence (AI) initiative. The proposed venture will have state backing and considerable involvement from US technology firms. It is the latest move in a global expansion of AI ambitions beyond the existing heartlands of the United States, China and Europe. However, as Nature India, Nature Africa and Nature Middle East report in a series of articles on AI in low- and middle-income countries (LMICs) published on 21 May (see go.nature.com/45jy3qq), the path to home-grown AI doesn’t need to be paved with billions, or even hundreds of millions, of dollars, or depend exclusively on partners in Western nations or China…, as a News Feature that appears in the series makes plain (see go.nature.com/3yrd3u2), many initiatives in LMICs aren’t focusing on scaling up, but on ‘scaling right’. They are “building models that work for local users, in their languages, and within their social and economic realities”.

More such local initiatives are needed. Some of the most popular AI applications, such as OpenAI’s ChatGPT and Google Gemini, are trained mainly on data in European languages. That would mean that the model is less effective for users who speak Hindi, Arabic, Swahili, Xhosa and countless other languages. Countries are boosting home-grown apps by funding start-up companies, establishing AI education programmes, building AI research and regulatory capacity and through public engagement.

Those LMICs that have started investing in AI began by establishing an AI strategy, including policies for AI research. However, as things stand, most of the 55 member states of the African Union and of the 22 members of the League of Arab States have not produced an AI strategy. That must change…(More)”.

The path for AI in poor nations does not need to be paved with billions

Paper by Yusuf Bozkurt, Alexander Rossmann, Zeeshan Pervez, and Naeem Ramzan: “Smart cities aim to improve residents’ quality of life by implementing effective services, infrastructure, and processes through information and communication technologies. However, without robust smart city data governance, much of the urban data potential remains underexploited, resulting in inefficiencies and missed opportunities for city administrations. This study addresses these challenges by establishing specific, actionable requirements for smart city data governance models, derived from expert interviews with representatives of 27 European cities. From these interviews, recurring themes emerged, such as the need for standardized data formats, clear data access guidelines, and stronger cross-departmental collaboration mechanisms. These requirements emphasize technology independence, flexibility to adapt across different urban contexts, and promoting a data-driven culture. By benchmarking existing data governance models against these newly established urban requirements, the study uncovers significant variations in their ability to address the complex, dynamic nature of smart city data systems. This study thus enhances the theoretical understanding of data governance in smart cities and provides municipal decision-makers with actionable insights for improving data governance strategies. In doing so, it directly supports the broader goals of sustainable urban development by helping improve the efficiency and effectiveness of smart city initiatives…(More)”.

Assessing data governance models for smart cities: Benchmarking data governance models on the basis of European urban requirements

Report by Stefaan Verhulst, Andrew J. Zahuranec, and Oscar Romero: “Trust is foundational to effective governance, yet its inherently abstract nature has made it difficult to measure and operationalize, especially in urban contexts. This report proposes a practical framework for city officials to diagnose and strengthen civic trust through observable indicators and actionable interventions.

Rather than attempting to quantify trust as an abstract concept, the framework distinguishes between the drivers of trust—direct experiences and institutional interventions—and its manifestations, both emotional and behavioral. Drawing on literature reviews, expert workshops, and field engagement with the New York City Civic Engagement Commission (CEC), we present a three-phase approach: (1) baseline assessment of trust indicators, (2) analysis of causal drivers, and (3) design and continuous evaluation of targeted interventions. The report illustrates the framework’s applicability through a hypothetical case involving the NYC Parks Department and a real-world case study of the citywide participatory budgeting initiative, The People’s Money. By providing a structured, context-sensitive, and iterative model for measuring civic trust, this report seeks to equip public institutions and city officials with a framework for meaningful measurement of civic trust…(More)”.

Making Civic Trust Less Abstract: A Framework for Measuring Trust Within Cities

Playbook by AI Policymaker Network & Deutsche Gesellschaft für Internationale Zusammenarbeit (GIZ) GmbH: “It moves away from talking about AI ethics in abstract terms but tells of building policies that work right-away in emerging economies and respond to immediate development priorities. The Playbook emphasises that a one-size-fits-all solution doesn’t work. Rather, it illustrates shared challenges—like limited research capacity, fragmented data ecosystems, and compounding AI risks—while spotlighting national innovations and success stories. From drafting AI strategies to engaging communities and safeguarding rights, it lays out a roadmap grounded in local realities….What can you expect to find in the AI Policy Playbook:

  1. Policymaker Interviews
    Real-world insights from policymakers to understand their challenges and best practices.
  2. Policy Process Analysis
    Key elements from existing policies to extract effective strategies for AI governance, as well as policy mapping.
  3. Case Studies
    Examples of successes and lessons learnt from various countries to provide practical guidance.
  4. Recommendations
    Concrete solutions and recommendations from actors in the field to improve the policy development process, including quick tips for implementation and handling challenges.

What distinguishes this initiative is its commitment to peer learning and co-creation. The Africa-Asia AI Policymaker Network comprises over 30 high-level government partners who anchor the Playbook in real-world policy contexts. This ensures that the frameworks are not only theoretically sound but politically and socially implementable…(More)”

The AI Policy Playbook

Article by Pieter Haeck and Mathieu Pollet: “..As the U.S. continues to up the ante in questioning transatlantic ties, calls are growing in Europe to reduce the continent’s reliance on U.S. technology in critical areas such as cloud services, artificial intelligence and microchips, and to opt for European alternatives instead.

But the European Commission is preparing on Thursday to acknowledge publicly what many have said in private: Europe is nowhere near being able to wean itself off U.S. Big Tech.

In a new International Digital Strategy the EU will instead promote collaboration with the U.S., according to a draft seen by POLITICO, as well as with other tech players including China, Japan, India and South Korea. “Decoupling is unrealistic and cooperation will remain significant across the technological value chain,” the draft reads. 

It’s a reality check after a year that has seen calls for a technologically sovereign Europe gain significant traction. In December the Commission appointed Finland’s Henna Virkkunen as the first-ever commissioner in charge of tech sovereignty. After few months in office, European Parliament lawmakers embarked on an effort to draft a blueprint for tech sovereignty. 

Even more consequential has been the rapid rise of the so-called Eurostack movement, which advocates building out a European tech infrastructure and has brought together effective voices including competition economist Cristina Caffarra and Kai Zenner, an assistant to key European lawmaker Axel Voss.

There’s wide agreement on the problem: U.S. cloud giants capture over two-thirds of the European market, the U.S. outpaces the EU in nurturing companies for artificial intelligence, and Europe’s stake in the global microchips market has crumbled to around 10 percent. Thursday’s strategy will acknowledge the U.S.’s “superior ability to innovate” and “Europe’s failure to capitalise on the digital revolution.”

What’s missing are viable solutions to the complex problem of unwinding deep-rooted dependencies….(More)”

Europe’s dream to wean off US tech gets reality check

Declaration by the United Nations Development Programme (UNDP), in partnership with the German Federal Ministry for Economic Cooperation and Development (BMZ): “We are at a crossroads. Despite the progress made in recent years, we need renewed commitment andvengagement to advance toward and achieve the Sustainable Development Goals (SDGs). Digital technologies, such as Artificial Intelligence (AI), can play a significant role in this regard. AI presents opportunities and risks in a world of rapid social, political, economic, ecological, and technological shifts. If developed and deployed responsibly, AI can drive sustainable development and benefit society, the economy, and the planet. Yet, without safeguards throughout the AI value chain, it may widen inequalities within and between countries and contribute to direct harm through inappropriate, illegal, or deliberate misuse. It can also contribute to human rights violations, fuel disinformation, homogenize creative and cultural expression, and harm the environment. These risks are likely to disproportionately affect low-income countries, vulnerable groups, and future generations. Geopolitical competition and market dependencies further amplify these risks…(More)”.

Hamburg Declaration on Responsible AI

Paper by Ajinkya Kulkarni, et al: “Children are one of the most under-represented groups in speech technologies, as well as one of the most vulnerable in terms of privacy. Despite this, anonymization techniques targeting this population have received little attention. In this study, we seek to bridge this gap, and establish a baseline for the use of voice anonymization techniques designed for adult speech when applied to children’s voices. Such an evaluation is essential, as children’s speech presents a distinct set of challenges when compared to that of adults. This study comprises three children’s datasets, six anonymization methods, and objective and subjective utility metrics for evaluation. Our results show that existing systems for adults are still able to protect children’s voice privacy, but suffer from much higher utility degradation. In addition, our subjective study displays the challenges of automatic evaluation methods for speech quality in children’s speech, highlighting the need for further research…(More)”. See also: Responsible Data for Children.

Children’s Voice Privacy: First Steps And Emerging Challenges

Blog by Seemay Chou: “In Abundance, Ezra Klein and Derek Thompson make the case that the biggest barriers to progress today are institutional. They’re not because of physical limitations or intellectual scarcity. They’re the product of legacy systems — systems that were built with one logic in mind, but now operate under another. And until we go back and address them at the root, we won’t get the future we say we want.

I’m a scientist. Over the past five years, I’ve experimented with science outside traditional institutes. From this vantage point, one truth has become inescapable. The journal publishing system — the core of how science is currently shared, evaluated, and rewarded — is fundamentally broken. And I believe it’s one of the legacy systems that prevents science from meeting its true potential for society.

It’s an unpopular moment to critique the scientific enterprise given all the volatility around its funding. But we do have a public trust problem. The best way to increase trust and protect science’s future is for scientists to have the hard conversations about what needs improvement. And to do this transparently. In all my discussions with scientists across every sector, exactly zero think the journal system works well. Yet we all feel trapped in a system that is, by definition, us.

I no longer believe that incremental fixes are enough. Science publishing must be built anew. I help oversee billions of dollars in funding across several science and technology organizations. We are expanding our requirement that all scientific work we fund will not go towards traditional journal publications. Instead, research we support should be released and reviewed more openly, comprehensively, and frequently than the status quo.

This policy is already in effect at Arcadia Science and Astera Institute, and we’re actively funding efforts to build journal alternatives through both Astera and The Navigation Fund. We hope others cross this line with us, and below I explain why every scientist and science funder should strongly consider it…(More)”.

Scientific Publishing: Enough is Enough

Get the latest news right in you inbox

Subscribe to curated findings and actionable knowledge from The Living Library, delivered to your inbox every Friday