Stefaan Verhulst
Article by Robert Booth: “Experts have found weaknesses, some serious, in hundreds of tests used to check the safety and effectiveness of new artificial intelligence models being released into the world.
Computer scientists from the British government’s AI Security Institute, and experts at universities including Stanford, Berkeley and Oxford, examined more than 440 benchmarks that provide an important safety net.
They found flaws that “undermine the validity of the resulting claims”, that “almost all … have weaknesses in at least one area”, and resulting scores might be “irrelevant or even misleading”.
Many of the benchmarks are used to evaluate the latest AI models released by the big technology companies, said the study’s lead author, Andrew Bean, a researcher at the Oxford Internet Institute…(More)”
Article by Michael Stebbins & Eric Perakslis: “By shifting funding from small underpowered randomized control trials to large field experiments in which many different treatments are tested synchronously in a large population using the same objective measure of success, so-called megastudies can start to drive people toward healthier lifestyles. Megastudies will allow us to more quickly determine what works, in whom, and when for health-related behavioral interventions, saving tremendous dollars over traditional randomized controlled trial (RCT) approaches because of the scalability. But doing so requires the government to back the establishment of a research platform that sits on top of a large, diverse cohort of people with deep demographic data.
According to the National Research Council, almost half of premature deaths (< 86 years of age) are caused by behavioral factors. Poor diet, high blood pressure, sedentary lifestyle, obesity, and tobacco use are the primary causes of early death for most of these people. Yet, despite studying these factors for decades, we know surprisingly little about what can be done to turn these unhealthy behaviors into healthier ones. This has not been due to a lack of effort. Thousands of randomized controlled trials intended to uncover messaging and incentives that can be used to steer people towards healthier behaviors have failed to yield impactful steps that can be broadly deployed to drive behavioral change across our diverse population. For sure, changing human behavior through such mechanisms is controversial, and difficult. Nonetheless studying how to bend behavior should be a national imperative if we are to extend healthspan and address the declining lifespan of Americans at scale….There is substantial risk when bringing together such deep personal data on a large population of people. While companies compile deep data all the time, it is unusual to do so for research purposes and will, for sure, raise some eyebrows, as has been the case for large studies like the aforementioned All of Us and the Million Veteran’s Program.
Patients fear misuse of their data, inaccurate recommendations, and biased algorithms—especially among historically marginalized populations. Patients must trust that their data is being used for good, not for marketing purposes and determining their insurance rates.

Icons © 2024 by Jae Deasigner is licensed under CC BY 4.0
Need for Data Interoperability
Many healthcare and community systems operate in data silos and data integration is a perennial challenge in healthcare. Patient-generated data from wearables, apps, or remote sensors often do not integrate with electronic health record data or demographic data gathered from elsewhere, limiting the precision and personalization of behavior-change interventions. This lack of interoperability undermines both provider engagement and user benefit..(More)”.
Report by Open Data Watch: “In early 2025, an abrupt withdrawal of development assistance—driven by pauses in foreign aid and wider donor retrenchment—triggered a systemic shock to global health data systems. These systems, already reliant on a concentrated set of bilateral and multilateral funders for surveys, civil registration and vital statistics (CRVS), health management information systems (HMIS), and disease surveillance, now face immediate interruptions and heightened medium-term risks to data continuity, quality, openness, and use.
This report synthesizes early disclosures from major agencies, data from the Organisation for Economic Co-operation and Development / Development Assistance Committee (OECD/DAC), and a rapid assessment survey covering more than half of national statistical offices (NSOs). Evidence on philanthropic and domestic financing is incomplete, and survey nonresponse may introduce bias, but convergent signals show broad exposure. Three unknowns will shape the next 12–18 months: the duration of donor withdrawals, the degree of philanthropic bridging, and the extent of government backfilling to protect core functions…(More)”.
Report by Brookings: “Cities in the U.S. and globally face a severe, system-wide housing shortfall—exacerbated by siloed, proprietary, and fragile data practices that impede coordinated action. Recent advances in artificial intelligence (AI) promise to increase the speed and effectiveness of data integration and decisionmaking for optimizing housing supply. But unlocking the value of these tools requires a common infrastructure of (i) shared computational assets (data, protocols, models) required to develop AI systems and (ii) institutional capabilities to deploy these systems to unlock housing supply. This memo develops a policy and implementation proposal for a “Home Genome Project” (Home GP): a cohort of cities building open standards, shared datasets and models, and an institutional playbook for operationalizing these assets using AI. Beginning with an initial pilot cohort of four to six cities, a Home GP-type initiative could help 50 partner cities identify and develop additional housing supply relative to business-as-usual projections by 2030. The open data infrastructure and AI tools developed through this approach could help cities better understand the on-the-ground impacts of policy decisions, while also providing a constructive way to track progress and stay accountable to longer-term housing supply goals…(More)”.
Book by Winnifred R. Louis, Gi K. Chonu, Kiara Minto, Susilo Wibisono: “Why do some societies evolve and adapt while others remain stagnant? What creates divisiveness and exclusion, and what leads to community cohesion and social progress? This book discusses the psychology of social system change and resistance to change, offering readers a deep exploration of the psychological dynamics that shape societal transformations. Readers explore psychological perspectives on intergroup relations and group processes, alongside interdisciplinary perspectives from environmental science, history, political science, and sociology, to question and challenge conventional thinking. This readable, entertaining book contains clear definitions, lucid explanations, and key learnings in each chapter that highlight the take-home points and implications, so that readers can apply these insights to their real-world challenges…(More)”.
Paper by Suyash Fulay, Sercan Demir, Galen Hines-Pierce, Hélène Landemore, Michiel Bakker: A large share of retail investors hold public equities through mutual funds, yet lack adequate control over these investments. Indeed, mutual funds concentrate voting power in the hands of a few asset managers. These managers vote on behalf of shareholders despite having limited insight into their individual preferences, leaving them exposed to growing political and regulatory pressures, particularly amid rising shareholder activism. Pass-through voting has been proposed as a way to empower retail investors and provide asset managers with clearer guidance, but it faces challenges such as low participation rates and the difficulty of capturing highly individualized shareholder preferences for each specific vote. Randomly selected assemblies of shareholders, or “investor assemblies,” have also been proposed as more representative proxies than asset managers. As a third alternative, we propose artificial intelligence (AI) enabled representatives trained on individual shareholder preferences to act as proxies and vote on their behalf. Over time, these models could not only predict how retail investors would vote at any given moment but also how they might vote if they had significantly more time, knowledge, and resources to evaluate each proposal, leading to better overall decision-making. We argue that shareholder democracy offers a compelling real-world test bed for AI-enabled representation, providing valuable insights into both the potential benefits and risks of this approach more generally…(More)”.
Report by Neil Kleiman, Eric Gordon and Mai-Ling Garcia: “AI is rapidly reshaping the public sector, but most efforts remain focused on optimizing existing processes rather than reimagining how institutions serve communities. If governments continue to pursue efficiency alone, they risk entrenching the very systems that residents already distrust. Based on two years of research—including more than 40 interviews, pilots in Boston, New York City, and San José, and a scan of national policy trends—we propose an alternative framework for public AI adoption: Adapt, Listen, and Trust (ALT).
Rather than reinforce the status quo, the ALT framework guides civic partners to build more responsive public institutions by (1) adapting to the amplified demand AI unleashes, (2) building shared civic infrastructure that enables genuine listening at scale, and (3) cultivating two-way accountability that deepens public trust. The report concludes by outlining concrete recommendations for governments, philanthropy, universities, and community organizations to align around the ALT approach…(More)”.
Paper by Mojgan Askarizade & Ensieh Davoodijam: “This study investigates the dynamics of public sentiment surrounding the 2024 Iranian presidential election by analyzing Persian-language tweets. We introduce the IranElectionTweet dataset, a comprehensive collection of 111,386 election-related tweets enriched with textual content, user metadata, and engagement indicators. Due to the sensitive political context and privacy considerations, the full dataset is not publicly released; instead, we provide a manually annotated subset of 500 tweets (Tweet IDs and dates) for benchmarking, along with reconstruction instructions and analysis code. To conduct sentiment analysis, we fine-tuned GPT-4 on a publicly available Persian sentiment dataset, adapting it to the linguistic and cultural nuances of Persian political discourse. In parallel, we evaluated three cutting-edge large language models, Claude Sonnet 3.7, DeepSeek-V3, and Grok-4, using a few-shot learning framework due to the unavailability of fine-tuning access at the time of experimentation. All models were benchmarked on a manually annotated subset of 500 tweets. DeepSeek-V3 attained the highest weighted F1-score and overall accuracy, indicating stronger performance on the majority classes and was selected as the primary model for sentiment classification. The final sentiment analysis was applied to the full dataset, capturing hourly and daily variations in sentiment and candidate mentions throughout the election period. The results reveal distinct patterns in public opinion corresponding to key political events, offering valuable insights into the real-time evolution of electoral sentiment on social media. This research highlights the effectiveness of advanced multilingual language models in low-resource settings and contributes to the broader understanding of political behavior in digital environments…(More)”.
Article by Sophia Knight: “The development of non-profit, public interest alternatives to access and debate information online can contribute to a healthier information ecosystem – the question is what role should public service media play in providing them? We are living within an increasingly volatile and unpredictable democratic landscape in the UK. We face challenges with political polarisation and social cohesion, exacerbated by the decades-long fragmentation of our civic infrastructure.
The shift to an online information ecosystem has disrupted traditional media. Digital public spaces have enabled almost anyone, anywhere, to speak their minds, opening new avenues for connection. Yet, the open internet has become overrun by sprawling platform monopolies, shaped by algorithms and profit-seeking incentives towards attention and outrage.
Policymakers, and to a large extent the media industry, are stuck on one part of the solution: regulating harmful online content. To move forward, we need to identify and build on opportunities to improve the digital information ecosystem, rather than only targeting potential threats…(More)”.
Paper by Marc E. B. Picavet, Peter Maroni, Amardeep Sandhu, and Kevin C. Desouza: “Generating strategic foresight for public organizations is a resource-intensive and non-trivial effort. Strategic foresight is especially important for governments, which are increasingly confronted by complex and unpredictable challenges and wicked problems. With advances in machine learning, information systems can be integrated more creatively into the strategic foresight process. We report on an innovative pilot project conducted by an Australian state government that leveraged generative artificial intelligence (AI), specifically large language models, for strategic foresight using a design science approach. The project demonstrated AI’s potential to enhance scenario generation for strategic foresight, improve data processing efficiency, and support human decision-making. However, the study also found that it is essential to balance AI automation with human expertise for validation and oversight. These findings highlight the importance of iterative design to develop robust AI tools for strategic foresight which, alongside stakeholder engagement and process transparency, build trust and ensure practical relevance…(More)”.