Explore our articles
View All Results

Stefaan Verhulst


Paper by Benjamin S. Manning & John J. Horton: “Useful social science theories predict behavior across settings. However, applying a theory to make predictions in new settings is challenging: rarely can it be done without ad hoc modifications to account for setting-specific factors. We argue that AI agents put in simulations of those novel settings offer an alternative for applying theory, requiring minimal or no modifications. We present an approach for building such “general” agents that use theory-grounded natural language instructions, existing empirical data, and knowledge acquired by the underlying AI during training. To demonstrate the approach in settings where no data from that data-generating process exists–as is often the case in applied prediction problems–we design a heterogeneous population of 883,320 novel games. AI agents are constructed using human data from a small set of conceptually related but structurally distinct “seed” games. In preregistered experiments, on average, agents predict initial human play in a random sample of 1,500 games from the population better than (i) a cognitive hierarchy model, (ii) game-theoretic equilibria, and (iii) out-of-the-box agents. For a small set of separate novel games, these simulations predict responses from a new sample of human subjects better even than the most plausibly relevant published human data…(More)”.

General Social Agents

Article by Stefaan Verhulst: “The world has become more complex, more dynamic and more interconnected than ever before. The challenges we face – from health to climate, from democratic resilience to economic transformation – are deeply intertwined. And we need new ideas to meet these challenges.  

Europe has never lacked intellectual ambition, but ideas alone aren’t enough. To make real progress, we need breakthrough discoveries. We need evidence of what works. And we need the institutional capacity to test, validate and scale solutions across borders and disciplines. 

That’s where science comes in. Yet good science depends on data. And if we want AI to supercharge discovery and transform science, then data becomes even more important. 

The ‘datafication’ of society

Digitalisation has led to an unprecedented datafication of society. When citizens engage with government services, visit a doctor, use a mobility platform, shop online or measure their steps and/or sleep through wearable devices, data are generated. 

But this datafication doesn’t stop with individual behaviour. It extends deep into the productive fabric of our economies. Manufacturing systemsindustrial supply chainslogistics networksenergy grids and robotic production lines are now embedded with sensors, connected devices and intelligent control systems. The implication is profound – data is no longer a by-product of digital services alone. It’s a structural feature of both our digital and physical infrastructures. 

The remarkable feature of digital data isn’t merely its volume. It’s its reusability. When done responsibly, data created for one purpose can often be reused for entirely different objectives – including scientific research. 

But there’s a fundamental constraint: access. Much of today’s most valuable data remains locked away in institutional stovepipes – within government agencies, universities and private companies. Despite its public value potential, it often remains inaccessible to scientists and public interest actors. 

Europe has taken important steps to address this data asymmetry. Open data policies have expanded transparency. The Data Governance Act and the Data Act seek to facilitate data sharing and rebalance power in data markets. Article 40 of the Digital Services Act creates pathways for vetted researchers to access platform data. The European Open Science Cloud seeks to enable the sharing of scientific data. Sectoral data spaces – including those envisioned under the European Health Data Space – and Data Labs aim to provide structured, interoperable infrastructures for data access and use. 

Yet instead of a steady expansion of access, we’re now witnessing a ‘data winter.’ Access to private sector data for research has declined in several domains. Open government data initiatives have slowed or been rolled back. Scientific datasets have become restricted or have disappeared. Open science has struggled to scale beyond pilot projects. And broader political retrenchment risks weakening some of the very infrastructures designed to enable responsible reuse. 

Generative AI’s rapid expansion has also triggered backlashLarge-scale data scraping for AI training has blurred the line between openness and extraction. Consequently, institutions and content creators have become more protective, sometimes closing access altogether. And without reliable access to diverse, high-quality data, scientific progress risks stagnation. 

What should Europe do? Three priorities stand out. 

Access shouldn’t be only supply-driven

For too long, data policy has focused on releasing datasets without clearly articulating the questions they’re meant to answer. But the value of data – and increasingly the value of AI – depends directly on the value of the question. 

In short, better questions define better discovery. 

If we want to unlock meaningful access, we must invest in what might be called ‘question science’ – the systematic identification of high-priority societal questions; the structuring of those questions so they are researchable and actionable; the mapping of those questions to existing or potential data sources; and embedding them into funding frameworks, governance mandates, and institutional strategies. 

When demand is vague, access debates remain abstract. When questions are clear, access becomes purposeful. Researchers, policymakers and data holders can align around concrete objectives. This requires structured, participatory processes that bring scientists, communities, funders and regulators together to define and prioritise the questions that matter most…(More)”.

Legitimate data access will determine whether European science has a bright or bleak future

Paper by Jana Leonie Peters and Marc Ziegele: “Users’ low willingness to participate in discussions in comment sections and the often-poor quality of their contributions have been identified as key challenges in online participation. To address these issues, previous research has proposed various strategies, including moderation. We argue that a less well-researched intervention, namely aggregation in the form of discussion summaries, reduces users’ information overload and enhances their objective knowledge and subjective knowledge, which in turn are positively associated with their willingness to participate and the deliberative quality of their comments. Results from an online experiment (n = 643) support most of our hypotheses, though objective knowledge does not directly impact willingness to comment. Differences between aggregation criteria were minimal, but fact-based aggregation was superior in improving objective knowledge compared to opinion- or argument-based approaches. These findings suggest that platform designers and moderators can utilize aggregation techniques to encourage participation and foster higher-quality online discourse…(More)”.

Through Aggregation to Deliberation? An Experimental Study on the Effects of Discussion Summaries on Users’ Willingness to Comment and the Deliberative Quality of Their Contributions

About: “AI systems, digital transformation, biodiversity markets, biotechnology, and finance for nature increasingly rely on data originating from indigenous and local territories.

However, the governance of these data flows remains largely undefined.

Version 1.0 of the Sovereign Data Supply Chain: Functional and Operational Framework— seeks to address these important topics.

This document is a structured framework designed to evolve through feedback, territorial validation, pilot implementation, and collective iteration—toward a valuable framework.

With Kinray Hub as the lead author, catalytic funder and co-strategist from NaturaTech LAC, and strategic support from Climate Collective, this version serves as an initial architecture for transitioning from extractive models to sovereign data chains based on Collective Rights in Latin America and the Caribbean…(More)“.

Sovereign Data Supply Chain: Functional and Operational Framework

Report by Neil Kleiman, Eric Gordon, and Mai-Ling Garcia: “As governments and communities across the United States struggle to make sense of artificial intelligence, one of the most capable—and underutilized—partners is often hiding in plain sight: local colleges and universities. Much of the public conversation about AI focuses on big tech companies or federal regulation. Meanwhile, far less attention has been paid to how higher education institutions can help cities and nonprofits deploy AI to serve residents and strengthen public trust.

Across the United States, higher education institutions are already governing AI internally, experimenting with operational use cases, and absorbing unprecedented investment to build technical capacity. And as the appetite for an AI-trained workforce blossoms, local colleges are now a prime pipeline for talent. At the same time, local governments and nonprofits are just beginning to respond to and translate AI’s promise into public value.

This asymmetry presents a clear gap: Colleges and universities are increasingly adept at deploying AI, but the connection between local communities and higher ed remains underdeveloped.

This brief argues that AI has created a rare institutional opening to bridge the divide. Colleges are seeking clearer public relevance, governments require technical capacity, and communities are demanding institutions that are more responsive and trustworthy. Local leaders from governors to nonprofit executives who recognize this alignment—and act on it—can shape how AI strengthens democratic infrastructure rather than allowing it to evolve according to purely academic or commercial priorities…(More)”.

The AI Lab Next Door

Article by Michelle Holko, John Wilbanks, and Sam Howell: “…Compute, talent, and capital are necessary for AI-enabled biotechnology, but biodata is the binding constraint. Without large, representative, and interoperable biological datasets, AI models cannot generalize, scale, or translate into real-world impact.

The application of AI to biotechnology carries profound promise for national power. From stronger, bio-based armor for U.S. warfighters to patching supply chain vulnerabilities with domestic biomanufacturing, the potential is as vast as biology itself. The country that leads in AI-enabled biology will set the pace not only in health and medical discovery but also in agriculture, industrial production, and potentially even future deterrence. Seizing this potential, however, will hinge on improving America’s access to high-quality, secure biodata that is designed specifically for AI.

Biodata holds the blueprints of life and has become a new form of strategic power in the age of AI. These data, including DNA, RNA, proteins, and metabolites, are foundational to innovation in bio-based materials, fuels, agriculture, and medicine.

The National Security Commission on Emerging Biotechnology’s 2025 final report concludes that dominance in biotechnology will “hinge on who controls the most complete, accurate, and secure biological datasets.” Biodata is a strategic asset for national power in the twenty-first century, analogous to advanced semiconductors or critical minerals. U.S. competitors, namely China, are moving fast to establish AI-bio leadership.

China’s Biotech Edge

China’s advantage in AI-enabled biotechnology is not simply scale, but also coordination. Beijing’s national strategies explicitly link biotechnology, big data, and artificial intelligence under directed planning, aiming to align data generation, compute resources, and industrial translation across sectors. One example is China’s non-invasive prenatal testing ecosystem: The domestic non-invasive prenatal testing market was valued at roughly $608 million in 2023 and is projected to exceed $1 billion by the end of the decade, reflecting widespread integration of genomic sequencing, hospital networks, and commercial bioinformatics services. Firms such as BGI Group operate large-scale sequencing and testing platforms (including the noninvasive fetal trisomy test) that generate and process substantial volumes of genomic data within an integrated ecosystem that spans clinical care, research, and industry. China has also rapidly expanded its domestic cell- and gene-therapy ecosystem, including multiple Chimeric Antigen Receptor T-cell therapy approvals and a growing clinical biomanufacturing base, shortening the path from research to deployment. At the same time, China is building the data substrate that makes AI-bio compounding possible: massive longitudinal health cohorts and national-level biodata platforms designed for large-scale integration and analysis…(More)”.

AI-Ready Biodata Is America’s Next Strategic Infrastructure

Article by Alberto Rodriguez Alvarez: “The next wave of AI will be defined by agentic systems that can take actions: query databases, navigate portals, retrieve records, and increasingly interact with public digital infrastructure at scale.

That shift is already showing up as traffic hitting government sites and services is becoming machine traffic. Some of it is benign (search and discovery). Some of it is ambiguous (scraping and automated browsing). And some of it could become actively harmful if AI agents can reserve scarce services, submit fraudulent requests, or generate volume that overwhelms public systems. 

The problem is that the government’s current interfaces were not designed for agent-to-government interactions, and the default state of the world has become improvisation: agents “figure it out” by scraping pages and guessing based on previous learning.

This is where Boston’s work becomes instructive. Rather than treating agents as something to block wholesale, or something to embrace without guardrails, Boston is experimenting with a middle path: build a governed, secure, and reliable layer that mediates how AI agent systems interact with government resources…(More)”.

AI agents are coming for government. How one big city is letting them in

Book by Nick Chater and George Loewenstein on “How Corporations and Behavioral Scientists Have Convinced Us That We’re to Blame for Society’s Deepest Problems”…: “Two decades ago, behavioral economics burst from academia to the halls of power, on both sides of the Atlantic, with the promise that correcting individual biases could help transform society. The hope was that governments could deploy a new approach to addressing society’s deepest challenges, from inadequate retirement planning to climate change—gently, but cleverly, nudging people to make choices for their own good and the good of the planet.

It was all very convenient, and false. As behavioral scientists Nick Chater and George Loewenstein show in It’s on You, nudges rarely work, and divert us from policies that do. For example, being nudged to switch to green energy doesn’t cut carbon, and it distracts from the real challenge of building a low-carbon economy.

It’s on You shows how the rich and powerful have repeatedly used a clever sleight of hand: blaming individuals for social problems, with behavioral economics an unwitting accomplice, while lobbying against the systemic changes that could actually help. Rather than trying to “fix” the victims of bad policies, real progress requires rewriting the social and economic rulebook for the common good…(More)”.

It’s on You

Blog by Santi Ruiz: “…Below are 10 lessons I’ve learned about handling government data:

  1. Administrative data has major gaps. It’s not just that we don’t collect things we should; it’s also that information a system like SEVIS should collect just isn’t in that system. While some data gaps result from human error, others are the product of data collection systems that are leaky, or that just don’t exist. We simply cannot know things one might assume we do, like which visa-holders are currently in the country, or the employer of every working international student, because the departure dates and employer addresses of working international students are only present a fraction of the time in SEVIS. The federal government doesn’t know these things either. Failing to adequately maintain records and non-mandatory both result in inconsistent record-keeping. These gaps occur on every level as we decline to write down valuable information, neglect to write down everything we’re supposed to, and fail to hold on to everything we once wrote down.
  2. When something seems off, it often is. Government datasets often have a small number of users; often a handful of civil servants in this or that agency. This means that inaccuracies can persist unnoticed for a surprisingly long time. If you encounter what seems like a major error in government data, it’s less likely to be a failure of your understanding than you might expect. In 2024, the US undercounted the number of international students by 200,000. The error went unnoticed for months until one diligent user contacted the agency responsible. The frequency of and methodology for data collection also change periodically, which leads to results that are technically correct, but also unintuitive and potentially misleading. Most quantitative disciplines rightly train students not to assume that the data is wrong until they’ve scrutinized their own work or their understanding of the data first. But if you’re working with certain kinds of government data, you should probably leap more quickly to suspect underlying data issues.
  3. If it’s a question on a form, you can find data on it. Government administrative data is commonly just collated responses to the same questionnaire. Reading the forms which feed into it can tell you what it might contain, and where to find it. Since information isn’t always collected where you might expect, learning an agency’s paperwork can save you time, too. While investigating how many H-1B visas go to former international students, and how much they earn, my colleague Jeremy happened to realize that US Citizenship and Immigration Services collects information on someone’s wages and current immigration status when they file an I-129 Petition for a Nonimmigrant Worker. He learned this by talking to someone who knows USCIS paperwork like the back of their hand: an experienced immigration lawyer. Without realizing it, his analysis wouldn’t have been nearly as rich.
  4. We’re not actually counting. Lots of government data is based on representative samples, and uses statistical methods to reach conclusions about the population at large. But that data is not produced by literally counting the population at large. This introduces various assumptions that can easily invalidate your findings if you forget to include them. The “irreversible demographic fact” claimed by politicians last year, that two million more Americans were employed than in the year prior, was the result of using data in ways the statistical agencies explicitly tell users not to. Jed Kolko describes how this statistic was actually a zero-sum accounting artifact, resulting in part from the fact that the population totals are pre-determined by the census, while nativity is not. Since the Current Population Survey measures variable immigrant and non-immigrant populations but is always scaled to match Census totals, any reduction in the reported foreign-born population will necessarily appear as an increase in the native-born population, even if it’s driven by changes in response rates rather than real departures…(More)”

Ten Thoughts on Government Data

Book by Vittorio Loreto, Vito D P Servedio, and Francesca Tria: “This book offers a unified quantitative framework for understanding the dynamics of novelty and innovation across biological, technological, and societal systems. It explores how first-time occurrences—ranging from everyday experiences to groundbreaking discoveries—can lead to subsequent breakthroughs. The content is organized into three main parts. The first part introduces essential theoretical tools for investigating the emergence of new ideas. The second part examines both classical and modern models that capture the evolution, interaction, and competition of innovations within complex systems. This section emphasizes the importance of models based on the concept of the ‘Adjacent Possible’, i.e., all those things—ideas, molecules, technologies— that are one step away from what actually exists. The final section presents empirical case studies that utilize computational and data-driven methods to uncover hidden patterns in the diffusion of novelty. A postface summarizes the main findings and provides insight into future directions for research. By synthesizing insights from theoretical and computational physics, complexity science, and social sciences, this work challenges traditional views on predictability and control. It demonstrates that the forces driving innovation are both serendipitous and systematic, offering new perspectives on how progress unfolds. This comprehensive approach provides valuable methodologies for researchers, students, practitioners, and the general public, making it an essential resource for anyone looking to understand the complex processes that shape our ever-evolving world…(More)”.

The Science of the New 

Get the latest news right in your inbox

Subscribe to curated findings and actionable knowledge from The Living Library, delivered to your inbox every Friday