Explore our articles
View All Results

Stefaan Verhulst

Paper by John Wihbey and Samantha D’Alonzo: “…reviews and translates a broad array of academic research on “silicon sampling”—using Large Language Models (LLMs) to simulate public opinion—and offers guidance for practitioners, particularly those in communications and media industries, conducting message testing and exploratory audience-feedback research. Findings show LLMs are effective complements for preliminary tasks like refining surveys but are generally not reliable substitutes for human respondents, especially in policy settings. The models struggle to capture nuanced opinions and often stereotype groups due to training data bias and internal safety filters. Therefore, the most prudent approach is a hybrid pipeline that uses AI to improve research design while maintaining human samples as the gold standard for data. As the technology evolves, practitioners must remain vigilant about these core limitations. Responsible deployment requires transparency and robust validation of AI findings against human benchmarks. Based on the translational literature review we perform here, we offer a decision framework that can guide research integrity while leveraging the benefits of AI…(More)”

AI Simulations of Audience Attitudes and Policy Preferences: “Silicon Sampling” Guidance for Communications Practitioners 

Article by Stefaan Verhulst, Roeland Beerten and Johannes Jutting: Declining survey responses, politically motivated dismissals, and accusations of “rigged” numbers point to a dangerous spiral where official statistics — the bedrock of evidence-based policy — becomes just another casualty of distrust in government. In the below, we suggest a different path: moving beyond averages and aggregates toward more citizen-centric statistics that reflect lived realities, invite participation, and help rebuild the fragile trust between governments and the governed.

What gets measured gets managed,” the adage goes. But what if what gets measured fails to reflect how people actually live, how they feel, and perhaps more importantly, what they care about? For too long, statistical agencies, the bedrock of evidence-based policymaking, have privileged averages over outliers, aggregates over anomalies, and the macro over the personal–in short, facts over feelings. The result? A statistical lens that often overlooks lived realities and held perceptions.

The strong emphasis on averages, national-level perspectives, and technocratic indicators always carried certain risks. In recent years the phrase “You can’t eat GDP” has popped up with increasing frequency: neatly constructed technical indicators often clash with lived reality, as citizens discovered during the post-COVID years of persistently high inflation for basic goods. Policies that failed to address citizen concerns have fueled discontent and anger in significant parts of the population, paving the way for a surge of populist and anti-democratic parties in both rich and poor countries. In today’s era of polycrisis, there is a growing imperative for reimagined policy processes that innovates and regains citizen trust. For that, we need to reinvent what and how we collect, interpret, use and communicate the evidence base for policies. In short, we need more trustworthy statistical foundations.

The challenge, it is important to emphasize, isn’t merely technical. It is epistemological and democratic. We face a potential crisis of inclusion and accountability, in which the question is not only how to measure, but also who gets to decide what counts as knowledge. If statistics remain too narrowly focused on averages and aggregates, they risk alienating the very citizens they are meant to serve. The legitimacy of official statistics will increasingly depend on their ability to reflect lived realities, incorporate diverse perspectives, and communicate findings in ways that resonate with public experience. In what follows, we therefore argue that, if official statistics are to remain legitimate, and trusted, they must evolve to include lived experiences — an approach that we call citizen-centric statistics…(More)”.

From Averages to Agency: Rethinking Official Statistics for the 21st Century

Article by Marianne Dhenin: “Big tech companies have played an outsize role in the war on Gaza since October 2023—from social media companies, who have been accused of systemic censorship of Palestine-related content, to Microsoft, Google, Amazon, and Palantir signing lucrative contracts to provide artificial intelligence (AI) and other technologies to the Israeli military.

Concerned with the industry’s role in the attacks on Gaza, Paul Biggar, founder of Darklang and CircleCI, a startup turned billion-dollar technology company, founded Tech for Palestine in January 2024. The organization serves as an incubator, helping entrepreneurs whose projects support human rights for Palestinians develop and grow their businesses. “Our projects are, on the one hand, using tech for good, and on the other hand, addressing the systemic challenges around Israel in the tech industry,” Biggar says.

He got an insider’s look at how the technology industry equips the Israeli military during his more than a decade as CEO and board member of the companies he founded. He was removed from the board of CircleCI after writing a blog post in December 2023, condemning industry bigwigs for “actively cheer[ing] on the genocide” in Palestine. At the time, the official death toll in the territory exceeded 18,600 people. The official death toll has since risen to over 60,000 people, and in August 2025, a United Nations-backed panel declared that famine is underway in the enclave.

Since its launch, Tech for Palestine has grown from a community of tech workers and other volunteers loosely organized on the communication platform Discord to a grant-making nonprofit that employs five full-time Palestinian engineers and supports 70 projects. It became a 501(c)(3) organization in December 2024, enabling it to solicit large private donations and source smaller donations through a donation link on its website, with the goal of scaling up to supporting 100 projects by the end of 2025.

Tech for Palestine’s most ambitious projects include Boycat, an app and browser extension that helps users identify products from companies that profit from human rights abuses in Palestine, and UpScrolled, an Instagram alternative that promises no shadow banning, no biased algorithms, and no favoritism. Meta, which owns Instagram and Facebook, has been found to censor content in support of Palestine on its platforms, according to an audit conducted by Human Rights Watch…(More)”.

Technology Against Genocide

An initiative of SDI and its partners: “Know Your City (KYC) is a groundbreaking global campaign for participatory, pro-poor, people-centered urban governance. KYC unites organised slum dwellers and local governments in partnerships anchored by community-led slum profiling, enumeration, and mapping. The KYC campaign builds on the rich history of grassroots informal settlement profiling by SDI, pioneered by Indian slum and pavement dwellers, to make informal settlements visible to city authorities.

Through SDI support for peer-to-peer exchanges, this process spread to 30 countries on three continents and ushered in a new era for slum community dialogue with governments. Led by a commitment from UCLG-Africa, a KYC campaign was born that formalized this international partnership between slum dwellers and local governments. The KYC campaign serves as a powerful engine for community organisation, participatory local governance, partnership building, and collective action to enhance inclusive city planning and management.

Know Your City TV (KYC TV) ensures youth living in informal settlements are not excluded from this process by organising them to produce media and films about life in slums and creating space to engage in dialogue about city futures. KYC has the potential to guide not only local governments but also national and international policies, programmes, and investments on a large scale, and to significantly contribute to addressing the persistent social, economic, and political risks facing cities and nations.

Dive into our KYC portal, home to community-collected slum data from over 7,000 slums in more than 16 countries across the global south…(More)”.

Know Your City

Article by Amit Roy Choudhury: “The Singapore Digital Gateway (SGDG) encompasses Singapore’s digital and artificial intelligence (AI) strategies, blueprints, governance frameworks, guides, playbooks, and open-source tools. 

This was announced by Singapore’s Minister for Digital Development and Information, Josephine Teo, at the high-level multi-stakeholder informal meeting to launch the Global Dialogue on Artificial Intelligence (AI) Governance in New York on September 25, 2025…. 

The SGDG would also offer training courses and capacity-building initiatives, co-created by the Ministry of Digital Development and Information (MDDI) and international partners, such as the World Bank and the United Nations Development Programme (UNDP), she added.

In its initial phase, the SGDG will cover the AI and the digital government domains. 

The AI section would include Singapore’s National AI Strategy 2.0, the AI Verify testing framework for evaluating AI systems, and Project Moonshot, one of the world’s first comprehensive toolkits for testing Large Language Models (LLMs).  

Countries would also be able to access the AI Playbook for Small States, developed by Singapore jointly with Rwanda.  

The digital government domain would feature Singapore’s Digital Government Blueprint with 14 key performance indicators like the Singpass digital identity system architecture, and open-source tools like FormSG for creating secure digital forms and Isomer for building government websites.  

MDDI would progressively expand the SGDG to cover more areas such as cybersecurity, online safety, smart cities, and the digital economy, in phases…(More)”.

Singapore launches gateway to provide countries access to digital resources

Blog by Divya Siddarth: “Evaluations are quietly shaping AI. Results can move billions in investment decisions, set regulation, and influence public trust. Yet most evals tell us little about how AI systems perform in and impact the real world. At CIP we are exploring ways that collective input (public, domain expert, and regional) can help solve this. Rough thoughts below.

1. Evaluation needs to be highly context specific, which is hard. Labs have built challenging benchmarks for reasoning and generalization (ARC-AGI, GPQA, etc.), but most still focus on decontextualized problems. What they miss is how models perform in situated use: sustaining multi-hour therapy conversations, tutoring children around the world across languages, mediating policy, and shaping political discourse in real time. These contexts redefine what ‘good performance’ means.

2. Technical details can swing results. Prompt phrasing, temperature settings, even enumeration style can cause substantial performance variations. Major investment and governance decisions are being made based on measurements that are especially sensitive to implementation details. We’ve previously written about some of these challenges and ways to address them.

3. Fruitful comparison is almost impossible. Model cards list hundreds of evaluations, but without standardized documentation in the form of prompts, parameters, and procedures, it’s scientifically questionable to compare across models. We can’t distinguish genuine differences from evaluation artifacts.

4. Evals are fragmented and no single entity is positioned to solve this. Labs run proprietary internal evals, and academic efforts are often static and buried in research papers and github repos. They also can’t build evals for every possible context and domain worldwide. Third-party evaluations only measure what they’re hired to measure. Academic benchmarks often become outdated. In practice, we can think of evals in three categories:

  • Capability evals (reasoning, coding, math), which measure raw problem-solving.
  • Risk evals (jailbreaks, alignment, misuse), which probe safety and misuse potential
  • Contextual evals (domain- or culture-specific), which test performance in particular settings…(More)”.
Notes on building collective intelligence into evals

Worldbank Report: “The transformative potential of artificial intelligence (AI) in public governance is increasingly recognized across both developed and developing economies. Governments are exploring and adopting AI technologies to enhance service delivery, streamline administrative efficiency, and strengthen data-driven decision-making. However, the integration of AI into public systems also introduces ethical, technical, and institutional challenges – ranging from algorithmic bias and lack of transparency to data privacy concerns and regulatory fragmentation. These challenges are especially salient in public sector contexts, where trust, accountability, and equity are crucial. This paper addresses a central question: How can public institutions adopt AI responsibly while safeguarding privacy, promoting fairness, and ensuring accountability? In particular, it focuses on the readiness of government agencies to implement AI technologies in a trustworthy and responsible manner. This paper responds to that gap by providing both conceptual grounding and practical tools to support implementation. First, it synthesizes key ethical considerations and international frameworks that underpin trustworthy AI governance. Second, it introduces relevant technical solutions, including explain ability models, privacy-enhancing technologies, and algorithmic fairness approaches, that can mitigate emerging risks in AI deployment. Third, it presents a self-assessment toolkit for public institutions: a decision flowchart for AI application and a data privacy readiness checklist. These tools are designed to help public sector actors evaluate their preparedness, identify institutional gaps, and inform internal coordination processes prior to AI adoption. By bridging theory and practice, this paper contributes to ongoing global efforts to build trustworthy AI that is lawful, ethical, inclusive, and institutionally grounded…(More)”.

Building Trustworthy Artificial Intelligence: Frameworks, Applications, and Self-Assessment for Readiness

Paper by Deininger, Klaus et al: “This paper explores whether satellite imagery can be used to derive a measure to estimate conflict-induced damage to agricultural production and compare the results to those obtained using media-based conflict indicators, which are widely used in the literature. The paper combines area for summer and winter crops from annual crop maps for 2019–24 with measures of conflict-related damage to agricultural land based on optical and thermal satellite sensors. These data are used to estimate a difference-in-differences model for close to 10,000 Ukrainian village councils. The results point to large and persistent negative effects that spill over to conflict-unaffected village councils. The predicted impact is three times larger, with a distinctly different distribution across key domains (for example, territory controlled by Ukraine and the Russian Federation) using the preferred image-based indicator as compared to a media-based indicator. Satellite imagery thus allows defining conflict incidence in ways that may be relevant to agricultural production and that may have implications for future research…(More)”.

Using Remotely Sensed Data to Assess War-Induced Damage to Agricultural Cultivation: Evidence from Ukraine

Press Release and blog by Mykhailo Fedorov: “Ukraine is betting on artificial intelligence — and this is not just a trend. It is our clear and defined mission: by 2030, we aim to become one of the world’s top three countries in terms of AI development and integration in the public sector.

This week, we took another major step toward that goal — we launched Diia.AI on the Diia portal. It is the world’s first national AI-agent that goes beyond answering questions — it actually provides government services directly within a chat. The AI assistant is now available in open beta, and users can already receive the first service through AI — an income certificate. New services will be rolled out gradually as the AI develops.

Our focus is to transform Diia from a digital services platform into a fully functional AI-agent that operates 24/7, without the need to manually fill out forms or fields. Diia is becoming a proactive assistant in the citizen–state relationship. The AI-agent doesn’t simply act as a chatbot that responds to queries — it takes action based on the user’s request. For example, you write to the assistant in the chat: «I need an income certificate», and receive it directly in your personal account on the Diia portal, with an email notification once it’s ready.

AI agents represent the cutting edge of artificial intelligence, fundamentally changing the way services are accessed globally. The future lies with agentic states, and Ukraine is boldly advancing toward this format — where a single user request leads directly to results. AI agents act as personal digital assistants, independently building action plans, initiating service requests, and autonomously executing all stages of task completion…(More)”.

Diia.AI: The World’s First National AI-Agent That Delivers Real Government Services

Article by Amer Sinha and Ryan McKenna: “As AI becomes more integrated into our lives, building it with privacy at its core is a critical frontier for the field. Differential privacy (DP) offers a mathematically sound solution by adding calibrated noise to prevent memorization. However, applying DP to LLMs introduces trade-offs. Understanding these trade-offs is crucial. Applying DP noise alters traditional scaling laws — rules describing performance dynamics — by reducing training stability (the model’s ability to learn consistently without experiencing catastrophic events like loss spikes or divergence) and significantly increasing batch size (a collection of training examples sent to the model simultaneously for processing) and computation costs.

Our new research, “Scaling Laws for Differentially Private Language Models”, conducted in partnership with Google DeepMind, establishes laws that accurately model these intricacies, providing a complete picture of the compute-privacy-utility trade-offs. Guided by this research, we’re excited to introduce VaultGemma, the largest (1B-parameters), open model trained from scratch with differential privacy. We are releasing the weights on Hugging Face and Kaggle, alongside a technical report, to advance the development of the next generation of private AI…

Armed with our new scaling laws and advanced training algorithms, we built VaultGemma, to date the largest (1B-parameters) open model fully pre-trained with differential privacy with an approach that can yield high-utility models…(More)”.

VaultGemma: The world’s most capable differentially private LLM

Get the latest news right in you inbox

Subscribe to curated findings and actionable knowledge from The Living Library, delivered to your inbox every Friday