Explore our articles
View All Results

Stefaan Verhulst

Article by Hélène Landemore: “American democracy has a personality problem.

At its core, our political system is a popularity contest. Elections reward those who are comfortable performing in public and on social media, projecting confidence and dominating attention. This dynamic tends to select for so-called alpha types, the charismatic and the daring, but also the entitled, the arrogant and even the narcissistic.

This raises a basic but rarely asked question: Why are we filtering out the quiet voices? And at what cost?

Over the past two decades, my research on collective intelligence in politics, democratic theory and the design of our institutions shows that the system structurally excludes those I call, in my new book, “the shy.” By the shy I mean not just the natural introverts, but all the people who have internalized the idea that they lack power, that politics is not built for them, and who could never imagine running for office. That is, potentially, most of us, though predictable groups — women, the young and many minorities — are overrepresented in that category.

The early-20th-century British writer G.K. Chesterton once offered a striking and unusual metaphor for what democracy should look like. He wrote, “All real democracy is an attempt (like that of a jolly hostess) to bring the shy people out.” What would our democratic institutions look like if we took that metaphor seriously?

One answer — perhaps the most promising one we have at this time — can be found in citizens’ assemblies.

Citizens’ assemblies are large groups of ordinary people, selected by lottery, who come together to learn about a public issue, hear from experts and advocacy groups, deliberate with one another and make recommendations. Picture jury duty for politics. Through random selection, citizens’ assemblies reach deep into the body politic to bring even the initially unwilling to the table. Once seated, participants are given time, structure and support to find their voices and contribute to forming a thoughtful collective judgment…(More)”.

No Shy Person Left Behind

Paper by Kayla Schwoerer: “Despite widespread adoption of open government data (OGD) initiatives, actual use remains limited, raising questions about how these public digital platforms are designed and governed. Prior research highlights the importance of data quality and usability for encouraging OGD use, yet empirical evidence linking specific design choices to observed user behavior remains scarce. This study draws on affordance theory to examine how metadata design features embedded in open data platforms shape open data use. The analysis draws on primary data collected from 15 U.S. cities’ open data platforms (N = 5863) to first assess the extent to which government agencies actualize metadata affordances to promote data quality and usability then test the relationship between affordance actualizations and two observed measures of use: dataset views and downloads. Results show that multiple dimensions of metadata practice are strongly and consistently associated with OGD use, with some practices linked to substantially higher levels of open data use. Even within a shared platform environment, variation in how publishers provide metadata correspond to meaningful differences in how often datasets are accessed, highlighting that metadata governance is not merely a technical detail but a factor that materially shapes user engagement with open data…(More)”.

Same platform, different outcomes: Metadata practices and open data use

Article by Amrita Sengupta and Shweta Mohandas: “The rapid integration of artificial intelligence in healthcare settings raises questions about the adequacy of existing data protection frameworks, particularly the reliance on informed consent as the primary mechanism for legitimatising the collection and use of health data for AI model training. This paper examines whether informed consent, as operationalized under India’s Digital Personal Data Protection Act (DPDPA) 2023, can serve as a satisfactory legal and ethical basis for using health data in AI development.

Drawing on the historical evolution of consent from medical research contexts to contemporary digital data protection regimes, this paper demonstrates that consent-based frameworks face structural limitations when applied to AI systems. The analysis reveals a trifecta of consent challenges: patients must consent to medical procedures, to digital health record creation, and implicitly to future AI model training, often without comprehending the scope, purpose, or risks of data reuse.

This paper advances three broad analyses: first, the limitations of informed consent in data protection and operationalisation challenges in healthcare, the dilution of patient consent and autonomy in AI model training, and the role of anonymisation for use of data for AI. Recognizing these limitations, the paper proposes alternative governance frameworks that complement individual consent…(More)”.

The imaginary of informed consent: Rethinking approaches to data use for AI in healthcare

Report by the Federation of American Scientists: “Local government and universities are critical to our communities. How do they work together? How can they support each other? How can we think differently about their relationship to one another – moving beyond big employers and land users to thinking about the fruits and labors of what the research community can do for local policy making.

The Civic Research Agenda is a multi-year, multi-partner study that is the first comprehensive reporting on the priority research needs of U.S. cities and counties. FAS has asked local governments directly about their research needs and pressing knowledge gaps that, if addressed, would help address their priority challenges and goals. It also provides an analysis of the supply side barriers (and recommendations) that will connect research to impact.

This report provides…

  1. research questions that are in demand by local governments; and
  2. specific recommendations for local governments and universities to improve and grow the research-to-impact pipeline for one simple purpose: make research actionable, understandable, and accessible to communities across the country…(More)”.
The Civic Research Agenda

Article by Stefaan Verhulst: “As artificial intelligence systems rapidly evolve and start to impact nearly every sector of society, the conversation around governance has mainly focused on models (and their output): their transparency, fairness, accountability, and alignment. Yet this focus, while necessary, is incomplete. AI systems are only as reliable, equitable, and effective as the data (input) on which they are trained and operate.

Data governance is not peripheral to AI governance — it is its bedrock.

At the same time, the rise of AI is not simply placing new demands on data governance; it is fundamentally transforming it. What counts as data, how it is curated, who has a say in its use, and which institutional arrangements govern it are all being reimagined in response to AI’s capabilities and risks.

This essay examines 10 key areas or shifts where data governance is being reshaped—either to accommodate AI or as a direct consequence of it…(More)”.

Data Governance in the AI Era: 10 Shifts Redefining Data, Institutions, and Practice

Article by Stefaan Verhulst and Despite decades of investment in statistical systems and open data initiatives, official data remains difficult to discover, interpret, and apply in practice. The challenge is no longer one of availability, but of (re)usability. This persistent gap underscores a broader paradox at the heart of contemporary data governance: data may be open, yet it remains functionally inaccessible for many intended users.

In this context, the International Monetary Fund has been a pioneer in exploring how artificial intelligence and open data can intersect to address this usability challenge. Its StatGPT: AI for Official Statistics report, by James TebrakeBachir BoukherouaaJeff Danforth, and Niva Harikrishnan, offers a timely and important contribution to this evolving conversation – pointing toward a future where AI can make official data more navigable, interpretable, and actionable.

The data challenge is no longer just about availability, but about (re)usability.

The report provides a detailed account of the friction users face across the data lifecycle. Even highly motivated users must navigate fragmented portals, inconsistent terminology, and siloed datasets, often spending significant time assembling information that should be readily accessible. 

The result is a fragmented ecosystem in which metadata and data are distributed across institutions and platforms, forcing users to navigate multiple systems and standards—and to reconstruct context—before they can assess whether the data is re-usable. 

This resonates strongly with broader observations across the open data ecosystem: access alone does not guarantee impact. Without the ability to meaningfully engage with data, openness risks becoming performative rather than transformative…(More)”.

StatGPT and the Fourth Wave of Open Data

Article by Northwestern Innovation Institute: “Universities produce a vast number of scientific publications each year. Yet only a small share ultimately leads to patents, startups, or broader industry adoption. The challenge is not a shortage of ideas, but limited visibility into which discoveries — and the researchers behind them — are most likely to move toward commercialization.

A new platform developed at the Northwestern Innovation Institute, called InnovationInsights, is designed to make that hidden potential visible.

Using artificial intelligence and large-scale research data, the system helps technology transfer offices identify faculty, papers, and emerging research areas with strong commercial promise — including many discoveries that would otherwise remain outside the innovation pipeline.

At the core of the platform is a searchable interface built around two levels of insight: researchers and their individual publications.

Users can explore researcher profiles that bring together key signals related to translational activity, including publication history, recent high-impact work, invention disclosures and whether a researcher’s papers have been cited by company patents. These profiles allow innovation teams to quickly identify faculty whose work is influencing industry or to show patterns associated with future commercialization.

At the publication level, InnovationInsights assigns each paper a commercial potential score based on machine-learning models trained on decades of historical data linking research outputs to downstream outcomes. Users can rank papers by this score to identify emerging discoveries that may be ready for translation, even before any patent activity occurs.

The platform also tracks citations from company patents, offering a direct view of where academic research is being used in industrial innovation. By comparing commercial potential scores with patent influence,institutions can see both future opportunity and current industry relevance…(More)”.

Finding the innovators hiding in plain sight

Article by Leif Weatherby and Benjamin Recht: “A recent Axios story on maternal health policy referenced “findings” that a majority of people trusted their doctors and nurses. On the surface, there’s nothing unusual about that. What wasn’t originally mentioned, however, was that these findings were made up.

Clicking through the links revealed (as did a subsequent editor’s note and clarification by Axios) that the public opinion poll was a computer simulation run by the artificial intelligence start-up Aaru. No people were involved in the creation of these opinions.

The practice Aaru used is called silicon sampling, and it’s suddenly everywhere. The idea behind silicon sampling is simple and tantalizing. Because large language models can generate responses that emulate human answers, polling companies see an opportunity to use A.I. agents to simulate survey responses at a small fraction of the cost and time required for traditional polling.

Phone polling has become exponentially harder. Web polling is too uncertain. Silicon sampling removes the messy, costly part of asking people what they think…(More)”.

This Is What Will Ruin Public Opinion Polling for Good

CRS Report: “Federal data can provide valuable information for various audiences—from farmers seeking to protect bats that eat crop-harming insects to local efforts determining where to rebuild to avoid coastal flooding. In 2013, the Office of Management and Budget (OMB) described openly available federal data and statistical information as “a valuable national resource and strategic asset” that, when made accessible, discoverable, and usable by the public, “can help fuel entrepreneurship, innovation, and scientific discovery.”

Efforts to make federal data more readily available have evolved over time. Such data may have been stored and filed in hard and paper copies and later in software and electronic formats. Today, certain data may be retrieved through agency websites or on Data.gov. Data.gov itself is a case study for open data, intended to demonstrate that making federal data available can help agencies avoid duplicative internal research, enable the discovery of complementary datasets held by other agencies, and empower employees to make better-informed, data-driven decisions, among other benefits.

Throughout 2025, media reports have suggested that the availability of federal data has been reduced. Some observers are also tracking the removal of specific datasets, variables, and tools. In parallel, changing public perspectives on data availability may demand new levels of data access, such as making data available for predictable periods of time, in a variety of software-compatible formats, and with appropriate descriptive metadata for easing findability and usability of the information. While statute discusses when and how information is to be added to Data.gov, it does not explain whether and how information may be removed. Although researchers and the public may derive value from being able to trace data over time to determine changes in trends or collection methods, the statute does not explicitly consider versioning requirements for agency data. However, requiring these attributes for Data.gov may help address or clarify difficulties in measuring data availability. Congress may be interested in determining whether there are trends to certain data becoming available or to when data is altered and removed. Such trends may provide insight and direction for Congress to further examine agency activities or make decisions to support new data use cases.

Information availability, of which data availability is a type, can be considered the intersection of when and how information is released. Section 3552 of Title 44 of the U.S. Code defines information availability as “ensuring timely and reliable access to and use of information.” Generally, statute and associated OMB guidance contemplates two types of information availability in terms of timing: (1) proactive disclosure and information dissemination and (2) request-based disclosure. Certain types of data have specific requirements in terms of formatting and structure to ensure that the information can be made available and potentially archived.

This report examines the variables of federal data availability and its policy underpinnings. The report discusses the state and concept of federal data availability and explains the information life cycle framework. It explains how information may be made available proactively or upon request through existing mechanisms and also explains statutory requirements for information dissemination, preservation, and whether and when information can be removed. The report concludes with policy options for Congress, including a review of efforts to preserve federal data through web captures; examining controls to assess data versioning, sourcing, and modifications; and, finally, considerations for implementing data governance and transparency mechanisms throughout agency structures…(More)”.

Availability of Federal Data: Policy Considerations for Disclosure, Preservation, and Governance

Article by Carl Zimmer: “Scientists publish more than 10 million studies and other publications a year. Some of those findings will add to humanity’s storehouse of knowledge. But some will be wrong.

To assess a study, scientists can replicate it to see if they get the same result. But seven years ago, a team of hundreds of scientists set out to find a faster way to judge new scientific literature. They built artificial intelligence systems to predict whether studies would hold up to scrutiny.

The project, funded by the Defense Advanced Research Projects Agency, or DARPA, was called Systematizing Confidence in Open Research and Evidence — SCORE, for short. The idea came from Adam Russell, then a program manager for the agency. He envisioned generating a kind of credit score for science.

“People can say, ‘Hey, this is likely to be robust, we can premise a policy on it,’” said Dr. Russell, who is now at the University of Southern California. “‘But this? Nah, this might make for a book in the airport.’”

The SCORE team inspected hundreds of studies, running many of them again, to better understand what makes research hold up. Now it is publishing a raft of papers on those efforts.

For now, a scientific credit score remains a dream, the researchers say. Artificial intelligence cannot make reliable predictions…

For more than 15 years, some scientists have been trying to change the culture. They started by documenting the extent of the problem. In the early 2010s, Dr. Nosek and colleagues replicated 100 psychology papers — and matched the original results only 39 percent of the time.

In another project, Dr. Nosek teamed up with cancer biologists to replicate 50 experiments on animals and human cells. Fewer than half of the results withstood their scrutiny…(More)”.

Can Science Predict When a Study Won’t Hold Up?

Get the latest news right in your inbox

Subscribe to curated findings and actionable knowledge from The Living Library, delivered to your inbox every Friday