Stefaan Verhulst
Report by CIPL: “Without these data categories, organizations may be unable to uncover disparities in how AI models perform across different demographic groups, making it impossible to ensure fairness and equal benefits of AI across all communities. For instance, in order to ensure a bank’s AI system is not used to assess whether a customer is creditworthy enough to apply for a mortgage in a way that disproportionally denies mortgages to people with a certain ethnicity, the developer of the AI system needs to be able to distinguish the ethnicity of the people about whom its AI system makes decisions. Regulators such as the UK’s ICO acknowledge that sensitive data may be necessary to assess discrimination risks, evaluate model performance, and retrain models accordingly. The categorical restrictions many data protection laws place on sensitive data processing, such as requiring specific consent, coupled with an increasingly broad interpretation of the concept of sensitive data, can place organizations in a position of being unable to include sensitive data in AI training datasets to the detriment of the performance of the model, where such consent is not obtainable, for example..(More)”
Article by Mark Schmitt: “We’re learning a lot about how government can shape our lives by watching the second Trump Administration dismantle it. One lesson is that government’s capacity to do good runs on information no less than on funding and regulations. From weather and economic forecasts to the census to predictions of other countries’ military capabilities to vaccine monitoring, data and ideas generated inside and outside of the federal government have guided decisions in a world of profound complexity. But as the young men of Elon Musk’s DOGE figured out quickly, information is also a point of vulnerability for the entire workings of government, and it can be exploited by those like Musk and Trump who seek to disable government, concentrate its power, or redirect that power to private profit.
Dozens of small federal agencies devoted to information and ideas have been gutted; expert advisory commissions disbanded; and grants for libraries, museums, and scientific and health research cut off without review. Indicators such as the National Assessment of Educational Progress, which always had strong conservative support, have been cancelled, pared back, or delayed, often because contracts were arbitrarily canceled, advisory panels dissolved, and key staff fired.
Much of the loosely connected galaxy of information and data that guides policy falls outside the formal boundaries of government, in a pluralistic set of institutions that are independent of the administration or political parties. Along with universities, independent policy research organizations—think tanks—are key to the system of knowledge production and policy ideas, particularly in the United States. Every think tank, aside from the few that maintain an allegiance to the current Administration, now faces a test: How do they not only survive, but remain relevant when the assumptions and processes under which they were born have been wiped away? How can their capacities be put to good use at a moment when the idea of informed decision-making is itself under attack, when little matters other than the raw and often arbitrary exercise of power?..(More)”.
Paper by Chiara Gallese et al: “Research has shown how data sets convey social bias in Artificial Intelligence systems, especially those based on machine learning. A biased data set is not representative of reality and might contribute to perpetuate societal biases within the model. To tackle this problem, it is important to understand how to avoid biases, errors, and unethical practices while creating the data sets. In order to provide guidance for the use of data sets in contexts of critical decision-making, such as health decisions, we identified six fundamental data set features (balance, numerosity, unevenness, compliance, quality, incompleteness) that could affect model fairness. These features were the foundation for the FanFAIR framework.
We extended the FanFAIR framework for the semi-automated evaluation of fairness in data sets, by combining statistical information on data with qualitative features. In particular, we present an improved version of FanFAIR which introduces novel outlier detection capabilities working in multivariate fashion, using two state-of-the-art methods: the Empirical Cumulative-distribution Outlier Detection (ECOD) and Isolation Forest. We also introduce a novel metric for data set balance, based on an entropy measure.
We addressed the issue of how much (un)fairness can be included in a data set used for machine learning research, focusing on classification issues. We developed a rule-based approach based on fuzzy logic that combines these characteristics into a single score and enables a semi-automatic evaluation of a data set in algorithmic fairness research. Our tool produces a detailed visual report about the fairness of the data set. We show the effectiveness of FanFAIR by applying the method on two open data sets…(More)”.
Paper by Ida Kubiszewski: “To achieve sustainable wellbeing for both humanity and the rest of nature, we must shift from a narrow focus on Gross Domestic Product (GDP) to a broader understanding and measurement of sustainable wellbeing and prosperity within the planetary boundaries. Several hundred alternative indicators have been proposed to replace GDP, but their variety and lack of consensus have allowed GDP to retain its privileged status. What is needed now is broad agreement on shifting beyond GDP. We conducted a systematic literature review of existing alternative indicators and identified over 200 across multiple spatial scales. Using these indicators, we built a database to compare their similarities and differences. While the terminology for describing the components of wellbeing varied greatly, there was a surprising degree of agreement on the core concepts and elements. We applied semantic modelling to estimate the degree of similarity among the indicators’ components and identified those that represented a broad synthesis. Results show that indicators with around 20 components capture a large share of the overall similarity across the indicators in the dataset. Beyond 20 components, adding additional components yielded diminishing returns in similarity. Based on this, we created a 20-component indicator to serve as a model for building consensus and mapped its relationship to several well-known alternative indicators. We aim for this database and synthesis to support broad stakeholder engagement toward the consensus we need to move beyond GDP…(More)“
Article by Adam Zable, Stefaan Verhulst: “Non-Traditional Data (NTD) — data digitally captured, mediated, or observed through instruments such as satellites, social media, mobility apps, and wastewater testing — holds immense potential when re-used responsibly for purposes beyond those for which it was originally collected. If combined with traditional sources and guided by strong governance, NTD can generate entirely new forms of public value — what we call the Third Wave of Open Data.
Yet, there is often little awareness of how these datasets are currently being applied, and even less visibility on the lessons learned. That is why we curate and monitor, on a quarterly basis, emerging developments that provide better insight into the value and risks of NTD.
In previous updates, we focused on how NTD has been applied across domains like financial inclusion, public health, socioeconomic analysis, urban mobility, governance, labor dynamics, and digital behavior, helping to surface hidden inequities, improve decision-making, and build more responsive systems.
In this update, we have curated recent advances where researchers and practitioners are using NTD to close monitoring gaps in climate resilience, track migration flows more effectively, support health surveillance, and strengthen urban planning. Their work demonstrates how satellite imagery can provide missing data, how crowdsourced information can enhance equity and resilience, and how AI can extract insights from underused streams.
Below we highlight recent cases, organized by public purpose and type of data. We conclude with reflections on the broader patterns and governance lessons emerging across these cases. Taken together, they illustrate both the expanding potential of NTD applications and the collaborative frameworks required to translate data innovation into real-world impact.
Categories
- Public Health & Disease Surveillance
- Environment, Climate & Biodiversity
- Urban Systems, Mobility & Planning
- Migration
- Food Security & Markets
- Information Flows for Risk and Policy..(More)”.
Article by Gideon Lichfield: “Point your browser at publicai.co and you will experience a new kind of artificial intelligence, called Apertus. Superficially, it looks and behaves much like any other generative AI chatbot: a simple webpage with a prompt bar, a blank canvas for your curiosity. But it is also a vision of a possible future.
With generative AI largely in the hands of a few powerful companies, some national governments are attempting to create sovereign versions of the technology that they can control. This is taking various forms. Some build data centres or provide AI infrastructure to academic researchers, like the US’s National AI Research Resource or a proposed “Cern for AI” in Europe. Others offer locally tailored AI models: Saudi-backed Humain has launched a chatbot trained to function in Arabic and respect Middle Eastern cultural norms.
Apertus was built by the Swiss government and two public universities. Like Humain’s chatbot, it is tailored to local languages and cultural references; it should be able to distinguish between regional dialects of Swiss-German, for example. But unlike Humain, Apertus (“open” in Latin) is a rare example of fully fledged “public AI”: not only built and controlled by the public sector but open-source and free to use. It was trained on publicly available data, not copyrighted material. Data sources and underlying code are all public, too.
Although it is notionally limited to Swiss users, there is, at least temporarily, an international portal — the publicai.co site — that was built with support from various government and corporate donors. This also lets you try out a public AI model created by the Singaporean government. Set it to Singaporean English and ask for “the best curry noodles in the city”, and it will reply: “Wah lau eh, best curry noodles issit? Depends lah, you prefer the rich, lemak kind or the more dry, spicy version?”
Apertus is not intended to compete with ChatGPT and its ilk, says Joshua Tan, an American computer scientist who led the creation of publicai.co. It is comparatively tiny in terms of raw power: its largest model has 70bn parameters (a measure of an AI model’s complexity) versus GPT-4’s 1.8tn. And it does not yet have reasoning capabilities. But Tan hopes it will serve as a proof of concept that governments can build high-quality public AI with fairly limited resources. Ultimately, he argues, it shows that AI “can be a form of public infrastructure like highways, water, or electricity”.
This is a big claim. Public infrastructure usually means expensive investments that market forces alone would not deliver. In the case of AI, market forces might appear to be doing just fine. And it is hard to imagine governments summoning up the money and talent needed to compete with the commercial AI industry. Why not regulate it like a utility instead of trying to build alternatives?..(More)”
Report by Darya Minovi: “The Trump administration is systematically attacking a wide range of public health, environmental, and safety rules. By law, federal agencies must notify the public about potential rule changes and give them the opportunity to make comments on those changes. But in many cases, the Trump administration is evading that legal requirement.
In the first six months in office, roughly 600 final rules were issued across six key science agencies. In 182 of these rules, the administration bypassed the public notice and comment period, cutting the public out of the process of shaping rules that affect their health and safety and our planet. This undermines the principles of accountability and transparency that should be part of our democracy…(More)”.
Editorial by Christian Fynbo Christiansen, Persephone Doupi, Nienke Schutte, and Damir Ivanković: “The European Health Data Space (EHDS) regulation creates a health-specific ecosystem for both primary and secondary use of health data. HealthData@EU—the novel cross-border technical infrastructure for secondary use of electronic health data will be crucial for achieving the ambitious goals of the EHDS.
In 2022, the “HealthData@EU pilot project,” co-funded under the EU4Health-framework (GA nr 101079839), brought together 17 partners including potential Health Data Access Bodies’ (HDABs) candidates, health data sharing infrastructures, and European agencies in order to build and test a pilot version of the HealthData@EU infrastructure and provide recommendations for metadata standards, data quality, data security, and data transfer to support development of the EHDS cross-border infrastructure.
This editorial and the other manuscripts presented in this Special EJPH Supplement will provide readers with insights from real-life scenarios that follow the research user journey and highlight the challenges of health research, as well as the solutions the EHDS can provide…(More)”.
Paper by Barbara J Evans and Azra Bihorac: “As nations design regulatory frameworks for medical AI, research and pilot projects are urgently needed to harness AI as a tool to enhance today’s regulatory and ethical oversight processes. Under pressure to regulate AI, policy makers may think it expedient to repurpose existing regulatory institutions to tackle the novel challenges AI presents. However, the profusion of new AI applications in biomedicine — combined with the scope, scale, complexity, and pace of innovation — threatens to overwhelm human regulators, diminishing public trust and inviting backlash. This article explores the challenge of protecting privacy while ensuring access to large, inclusive data resources to fuel safe, effective, and equitable medical AI. Informed consent for data use, as conceived in the 1970s, seems dead, and it cannot ensure strong privacy protection in today’s large-scale data environments. Informed consent has an ongoing role but must evolve to nurture privacy, equity, and trust. It is crucial to develop and test alternative solutions, including those using AI itself, to help human regulators oversee safe, ethical use of biomedical AI and give people a voice in co-creating privacy standards that might make them comfortable contributing their data. Biomedical AI demands AI-powered oversight processes that let ethicists and regulators hear directly and at scale from the public they are trying to protect. Nations are not yet investing in AI tools to enhance human oversight of AI. Without such investments, there is a rush toward a future in which AI assists everyone except regulators and bioethicists, leaving them behind…(More)”.
Paper by Yaniv Benhamou & Mélanie Dulong de Rosnay: “The present contribution proposes a novel commons-based copyright licensing model that provides individuals better control over all their data (including copyrightable, personal and technical data) and that covers recent developments in AI technology. The licensing model also proposes restrictions and boundaries (e.g. to authorised users and groups) to protect the commons, allowing communities to define and maintain the political values they choose. Building on the practice of collective management of copyright, it also empowers data trusts to govern and monitor the use and re-use of the concerned data. The model is ultimately meant to address the power imbalance and information asymmetry that characterise today’s economy of data as well as the “data winter” effect that restricts the accessibility of data for public interest, while accommodating and empowering individuals and communities that may have different political values and visions…(More)”.