Stefaan Verhulst
Article by Yiran Wang et al: “Public health decisions increasingly rely on large-scale data and emerging technologies such as artificial intelligence and mobile health. However, many populations—including those in rural areas, with disabilities, experiencing homelessness, or living in low- and middle-income regions of the world—remain underrepresented in health datasets, leading to biased findings and suboptimal health outcomes for certain subgroups. Addressing data inequities is critical to ensuring that technological and digital advances improve health outcomes for all.
This article proposes 10 core concepts to improve data equity throughout the operational arc of data science research and practice in public health. The framework integrates computer science principles such as fairness, transparency, and privacy protection, with best practices in public health data science that focus on mitigating information and selection biases, learning causality, and ensuring generalizability. These concepts are applied together throughout the data life cycle, from study design to data collection, analysis, and interpretation to policy translation, offering a structured approach for evaluating whether data practices adequately represent and serve all populations.
Data equity is a foundational requirement for producing trustworthy inference and actionable evidence. When data equity is built into public health research from the start, technological and digital advances are more likely to improve health outcomes for everyone rather than widening existing health gaps. These 10 core concepts can be used to operationalize data equity in public health. Although data equity is an essential first step, it does not automatically guarantee information, learning, or decision equity. Advancing data equity must be accompanied by parallel efforts in information theory and structural changes that promote informed decision-making…(More)”.
Paper by Kalena Cortes, Brian Holzman, Melissa D. Gentry & Miranda I. Lambert: “This study examines how digital incentives influence survey participation and engagement in a large randomized controlled trial of parents across seven Texas school districts. We test how incentive amount and information about vendor options affect response behavior and explore differences by language background. Incentivized parents were more likely to start and complete surveys and claim gift cards, though Spanish-speaking parents exhibited distinct patterns—greater completion rates but lower redemption rates, often selecting essential-goods vendors. Increasing incentive value and providing advance information both improved engagement. Findings inform the design of equitable, effective digital incentive strategies for diverse populations…(More)”.
World Bank Report: “Text and voice messages have emerged as a low-cost and popular tool for nudging recipients to change behavior. This paper presents findings from a randomized controlled trial designed to evaluate the impact of an information campaign using text and voice messages implemented in Punjab, Pakistan during the COVID-19-induced school closures. This campaign sought to increase study time and provide academic support while schools were closed and to encourage reenrollment when they opened, to reduce the number of dropouts. The campaign targeted girls enrolled in grades 5 to 7. Messages were sent out by a government institution, and the campaign lasted from October 2020 until November 2021, when schools had permanently re-opened. Households were randomized across three treatment groups and a control group that did not receive any messages. The first treatment group received gender-specific messages that explicitly referenced daughters in their households, and the second treatment group received gender-neutral messages. A third group was cross randomized across the first two treatment arms and received academic support messages (practice math problems and solutions). The results show that the messages increased reenrollment by 6.0 percentage points approximately three months after the intervention finished. Gender neutral messages (+8.9 percentage points) showed larger effect size on enrolment than gender-specific messages (+ 4.3 percentage points), although the difference is not statistically significant. The message program also increased learning outcomes by 0.2 standard deviation for Urdu and 0.2 standard deviation for math. The paper finds a small positive effect on the intensive margin of remote learning and an (equivalent) small negative effect on the intensive margin of outside tutoring. In line with similar studies on pandemic remediation efforts, the paper finds no effect of the academic support intervention on learning. The findings suggest that increased school enrollment played a role in supporting the observed increase in learning outcomes…(More)”.
Article by Tony Curzon Price: “Is 2026 the year that data collectives – unions, trusts, mutuals and clubs – tilt the balance of power in cyberspace away from mega-platforms and towards the citizen?
Last year, tech boss Sam Altman enabled ChatGPT to better remember past conversations in some jurisdictions, meaning that the AI might soon know us better than anyone else. In response to this sort of shift in power, we saw the creation of the First International Data Union (FIDU) to ensure that the data, knowledge and intimacy that Altman wants for ChatGPT would remain under members’ control and be managed according to their values.
Generative AI is causing a major overhaul of humanity’s life in cyberspace. There aren’t many examples of this sort of change – the web itself, Web 2.0 platforms, social media and mobile. The arrival of generative AI is upturning a decades-old equilibrium. ChatGPT has been the fastest-growing consumer application in history. It is displacing Google search in many lives. Open source models, especially from China, suggest that there are no natural moats in the technology, which means businesses can easily be overtaken by competitors with similar ideas.
Since the 2010s, many citizens and countries have become uncomfortable with how mega platforms have shaped the web. Scholars have pointed to these changes as important contributors to the deterioration of the mental health of children, the economic growth crisis and even falling global average IQs.
With the pieces of the cyberspace puzzle thrown into the air, citizens and governments do not want what happens next to be a repeat of what came before. Yet governments have discovered that their traditional policy tools against market power, like antitrust, are largely ineffective. Moreover, with the United States pushing back against tighter regulation abroad, even direct regulation by non-US states is proving difficult.
With other avenues of control largely defanged, this might be the moment for data unions. Data mutualisation promises to harness the collective power of citizens, providing a direct challenge to platforms…(More)”.
Article by Simon Ilyushchenko: “The Italian aphorism traduttore, traditore – the translator is a traitor – encapsulates a deep-seated suspicion about the act of translation: that to carry meaning from one language to another is always, to some degree, a corruption.
The writer and semiotician Umberto Eco took this charge seriously. In Experiences in Translation, Eco treats translation as an interpretive act – negotiation, compromise, loss. Every translation is an imperfect reproduction of the original. Every translator, in choosing what to preserve, chooses what to betray.
This is the situation confronting anyone who works with geospatial data – human or AI.
In 2019, Colombian researchers studied the relationship between armed conflict and forest cover in their country. Using the Global Forest Change dataset – a widely respected product derived from satellite imagery – they found something striking: if analysis is not done carefully, armed conflict appeared to be correlated with increases in forest cover.
One might infer, perversely, that violence was somehow good for forests. The authors’ interpretation of the ground data was the opposite.
Here is the mechanism they propose: armed conflict destabilized the rule of law, which enabled the rapid clearing of native forests for oil palm plantations. These plantations are monocultures – ecological deserts compared to the biodiverse forests they replaced. But to a satellite sensor, a mature oil palm plantation can read as ‘forest’. It has trees. The canopy closes. The pixels are green.
And even this example gets messy fast. The relationship between Colombian conflict and forest cover has generated substantial literature – but no consensus. Ganzenmüller et al. (2022) identified seven distinct categories of deforestation dynamics across Colombian municipalities; the same peace agreement drove opposite outcomes in different regions. Bodini et al. (2024), using loop analysis to model the socio-ecological system, found that causal pathways connecting violence, coca, cattle, and deforestation were so intertwined that their models for left-wing guerrilla dynamics showed “very low agreement with observed correlations.” The data didn’t fit a simple narrative – any simple narrative…(More)”.
Paper by Arianna Zuanazzi, Michael P. Milham & Gregory Kiar: “Modern brain science is inherently multidisciplinary, requiring the integration of neuroimaging, psychology, behavioral science, genetics, computational neuroscience and artificial intelligence (to name a few) to advance our understanding of the brain. Critical challenges in the field of brain health — including clinical psychology, cognitive and brain sciences, and digital mental health — include the great heterogeneity of human data, small sample sizes and the subjectivity or limited reproducibility of measured constructs. Large-scale, multi-site and multimodal open science initiatives can represent a solution to these challenges (for example, see refs.); however, they often struggle with balancing data quality while maximizing sample size5 and ensuring that the resulting data are findable, accessible, interoperable and reusable (FAIR). Furthermore, large-scale high-dimensional multimodal datasets demand advanced analytic approaches beyond conventional statistical models, requiring the expertise and interdisciplinary collaboration of the broader scientific community…
Data science competitions (such as Kaggle, DrivenData, CodaBench and AIcrowd) offer a powerful mechanism to bridge disciplines, solve complex problems and crowdsource novel solutions, as they bring individuals from around the world together to solve real-world problems. For more than 20 years (for example, see refs.), such competitions have been hosted by companies, organizations and research institutions to answer scientific questions, advance methods and techniques, extract valuable insights from data, promote organizations’ missions and foster collaboration with stakeholders. Every stage of a data science competition offers opportunities to promote big data exploration, advance analytic innovation and strengthen community engagement (Fig. 1). To translate these opportunities into actionable steps, we have shared our Data Science Competition Organizer Checklist at https://doi.org/10.17605/osf.io/hnx9b; this offers practical guidance for designing and implementing data science competitions in the brain health domain…(More)”
Paper by the Knight-Georgetown Institute (KGI): “Online platforms and services shape what we know, how we connect, and who gets heard. From elections and public health to commerce and conflict, platforms are now indispensable infrastructure for civic life. Their influence is vast, and so is the need to understand them.
As critical conversations publicly unfold on digital platforms, the ability to study these posts and content at scale has steadily diminished. Tools like Facebook’s CrowdTangle – which once offered researchers, journalists, and civil society a window into public online discourse – have disappeared. Meta, Reddit, and X have restricted data access tools that were once widely available, and researchers have faced threats of litigation for accessing public platform data.
Platforms restrict researcher access while public data is increasingly monetized for advertisers, data brokers, and training artificial intelligence (AI) systems. This imbalance – where companies profit while independent researchers are left in the dark – undermines transparency, limits free expression, and weakens oversight.
That is the reason for developing Better Access, a baseline framework for independent access to public platform data: the content, data, and information posted to platforms that anyone can access. …(More)”.
Article by Thomas R. Karl, Stephen C. Diggs, Franklin Nutter, Kevin Reed, and Terence Thompson: “From farming and engineering to emergency management and insurance, many industries critical to daily life rely on Earth system and related socioeconomic datasets. NOAA has linked its data, information, and services to trillions of dollars in economic activity each year, and roughly three quarters of U.S. Fortune 100 companies use NASA Earth data, according to the space agency.
Such data are collected in droves every day by an array of satellites, aircraft, and surface and subsurface instruments. But for many applications, not just any data will do.
Leaving reference quality datasets (RQDs) to languish, or losing them altogether, would represent a dramatic shift in the country’s approach to managing environmental risk.
Trusted, long-standing datasets known as reference quality datasets (RQDs) form the foundation of hazard prediction and planning and are used in designing safety standards, planning agricultural operations, and performing insurance and financial risk assessments, among many other applications. They are also used to validate weather and climate models, calibrate data from other observations that are of less than reference quality, and ground-truth hazard projections. Without RQDs, risk assessments grow more uncertain, emergency planning and design standards can falter, and potential harm to people, property, and economies becomes harder to avoid.
Yet some well-established, federally supported RQDs in the United States are now slated to be, or already have been, decommissioned, or they are no longer being updated or maintained because of cuts to funding and expert staff. Leaving these datasets to languish, or losing them altogether, would represent a dramatic—and potentially very costly—shift in the country’s approach to managing environmental risk…(More)”.
Paper by Cheng-Chun Lee et al: “Using novel data and artificial intelligence (AI) technologies in crisis resilience and management is increasingly prominent. AI technologies have broad applications, from detecting damages to prioritizing assistance, and have increasingly supported human decision-making. Understanding how AI amplifies or diminishes specific values and how responsible AI practices and governance can mitigate harmful outcomes and protect vulnerable populations is critical. This study presents a responsible AI roadmap embedded in the Crisis Information Management Circle. Through three focus groups with participants from diverse organizations and sectors and a literature review, we develop six propositions addressing important challenges and considerations in crisis resilience and management. Our roadmap covers a broad spectrum of interwoven challenges and considerations on collecting, analyzing, sharing, and using information. We discuss principles including equity, fairness, explainability, transparency, accountability, privacy, security, inter-organizational coordination, and public engagement. Through examining issues around AI systems for crisis management, we dissect the inherent complexities of information management, governance, and decision-making in crises and highlight the urgency of responsible AI research and practice. The ideas presented in this paper are among the first attempts to establish a roadmap for actors, including researchers, governments, and practitioners, to address important considerations for responsible AI in crisis resilience and management…(More)”.
Article by Dilek Fraisl et al: “The termination in February 2025 of the Demographic and Health Surveys, a critical source of data on population, health, HIV, and nutrition in over 90 countries, supported by the United States Agency for International Development, constitutes a crisis for official statistics. This is particularly true for low- and middle-income countries that lack their own survey infrastructure1. At a national level, in the United States, proposed cuts to the Environmental Protection Agency by the current administration further threaten the capacity to monitor and achieve environmental sustainability and implement the SDGs2,3. Citizen science—data collected through voluntary public contributions—now can and must step up to fill the gap and play a more central role in official statistics.
Demographic and Health Surveys contribute directly to the calculation of around 30 of the indicators that underpin the Sustainable Development Goals (SDGs)4. More generally, a third of SDG indicators rely on household surveys data5.
Recent political changes, particularly in the United States, have exposed the risks of relying too heavily on a single country or institution to run global surveys and placing minimal responsibility on individual countries for their own data collection.
Many high-income countries, particularly European ones, are experiencing similar challenges and financial pressures on their statistical systems as their national budgets are increasingly prioritizing defense spending6. Along with these budget cuts comes a risk that perceived efficiency gains from artificial intelligence are increasingly viewed as a pretense to put further budgetary pressure on official statistical agencies7.
In this evolving environment, we argue that citizen science can become an essential part of national data gathering efforts. To date, policymakers, researchers, and agencies have viewed it as supplementary to official statistics. Although self-selected participation can introduce bias, citizen science provides fine-scale, timely, cost-efficient, and flexible data that can fill gaps and help validate official statistics. We contend that, rather than an optional complement, citizen science data should be systematically integrated into national and global data ecosystems…(More)”.