Explore our articles

Stefaan Verhulst

Report by The Data Tank and Impact Licensing Initiative: “…report develops an overview of mechanisms and components of business and data governance
models for health data collaboratives. In this report, we use the terms ‘data collaborative’ or
‘ecosystem for data reuse’ indistinguishably to refer to collaborations between different stakeholders
across multiple sectors to exchange data in a way that overcomes silos to create public value (Susha
et al., 2017).
This report synthesises a rapid literature review of academic, policy, and industry documents,
including case studies, to examine governance and business models for health data reuse. We
examine in the literature different dimensions involved in sustaining an ecosystem for data reuse.
The report sets these out in the current regulatory context. It also considers the role that
mechanisms like a social license and impact licensing play in the sustainable governance of the
different business models as essential complements to the regulatory context. It analyses case
studies that can be mapped onto these models and offers pathways for a process to decide on a
business model…(More)”.

Governance and business models for sustainablehealth data collaboratives

Article by Rob Goodman and Jimmy Soni: “Just what is information? For such an intuitive idea, its precise nature proved remarkably hard to pin down. For centuries, it seemed to hover somewhere in a half-world between the visible and the unseen, the physical and the evanescent, the enduring medium and its fleeting message. It haunted the ancients as much as it did Claude Shannon and his Bell Labs colleagues in New York and New Jersey, who were trying to engirdle the world with wires and telecoms cables in the mid-20th century.

Shannon – mathematician, American, jazz fanatic, juggling enthusiast – is the founder of information theory, and the architect of our digital world. It was Shannon’s paper ‘A Mathematical Theory of Communication’ (1948) that introduced the bit, an objective measure of how much information a message contains…

Shannon’s ‘mathematical theory’ sets out two big ideas. The first is that information is probabilistic. We should begin by grasping that information is a measure of the uncertainty we overcome, Shannon said – which we might also call surprise. What determines this uncertainty is not just the size of the symbol vocabulary, as Nyquist and Hartley thought. It’s also about the odds that any given symbol will be chosen. Take the example of a coin-toss, the simplest thing Shannon could come up with as a ‘source’ of information. A fair coin carries two choices with equal odds; we could say that such a coin, or any ‘device with two stable positions’, stores one binary digit of information. Or, using an abbreviation suggested by one of Shannon’s co-workers, we could say that it stores one bit.

But the crucial step came next. Shannon pointed out that most of our messages are not like fair coins. They are like weighted coins. A biased coin carries less than one bit of information, because the result of any flip is less surprising. Shannon illustrated the point with this graph. You see that the amount of information conveyed by our coin flip (on the y-axis) reaches its apex when the odds are 50-50, represented as 0.5 on the x-axis; but as the outcome grows more predictable in either direction depending on the size of the bias, the information carried by the coin steadily declines.

Graph showing entropy (H) in bits versus probability (p) for a binary source. The curve peaks at p = 0.5 and falls to zero at p = 0 and p = 1.

The messages humans send are more like weighted coins than unweighted coins, because the symbols we use aren’t chosen at random, but depend in probabilistic ways on what preceded them. In images that resemble something other than TV static, dark pixels are more likely to appear next to dark pixels, and light next to light. In written messages that are something other than random strings of text, each letter has a kind of ‘pull’ on the letters that follow it…(More)”

The bit bomb

Paper by Scott E Page: “Recent breakthroughs in Al combined with steady advances in information technology change the physics of organizational and institutional design. Discussions and deliberations can now include people in disparate locations speaking simultaneously. This opens up new possible designs for interactions that will enhance our ability to produce collective intelligence.

How do we make collective intelligence happen? Who should be in the room, and how do we design and structure their interactions? Design-minded social scientists approach these questions by embedding them within a variety of formal frameworks. Economists design markets and matching processes to produce efficient, fair allocations with aligned incentives. Political scientists create voting rules to select winners with broad support. Organizational scientists design communication and authority structures capable of generating innovative solutions and thoughtful strategic decisions.

These design-minded social scientists operate within sets of constraints. Some are cognitive. People can only store, attend to, and process so much information. Some are physical. Rooms can only be so large. Some are temporal. Everyone must be available Tuesday at 4pm. These constraints limit possible designs, and that reduces the efficiency, fairness, representativeness, and innovativeness of the outcomes that we might achieve.

Those constraints have now changed. Recent breakthroughs in Artificial Intelligence (AI) combined with steady advancements in information technologies have altered the physics of organizational and institutional design (Farrell et al., 2025). In doing so, they have expanded the set of possible designs. The logic is straightforward: Removing constraints allows us to achieve more…(More)”.

Everyone, everywhere, all at once LLMs and the new physics of collective intelligence

Paper by Tara Cookson and Ruth Carlitz: “In 2013, the United Nations called for a “Data Revolution” to advance sustainable development. “Data for Good” initiatives that have followed bring together development and humanitarian actors with technology companies. Few studies have examined the composition of Data for Good partnerships or assessed the uptake and use of the data they generate. We help fill this gap with a case study of Meta’s (then Facebook) Survey on Gender Equality at Home, which reached over half a million Facebook users in more than 200 countries. The survey was developed in partnership with international development and humanitarian organizations. Our study is uniquely informed by our involvement in this partnership: we contributed subject matter expertise to the development of the survey and advised on dissemination strategies for the resulting data, which we also analyzed in our own academic work. We complement this autoethnographic perspective with insights from scholars of partnerships for development, and a practitioner framework to understand the factors connecting data to action. We find that including multiple partners can widen the scope of a project such that it gains breadth but loses depth. In addition, while it is (somewhat) possible to quantify the impact of a Data for Good partnership in terms of data use, “goodness” can also be assessed in terms of the process of producing data. Specifically, collaborations between organizations with different interests and resources may be of significant social value, particularly when they learn from one another—even if such goodness is harder to quantify…(More)”.

Gender data for good? Partnerships between tech companies and humanitarian and development organizations

Press Release: “Today, the Department of Commerce announced that it will begin posting real gross domestic product (GDP) data on the blockchain beginning with the July 2025 data…This is the first time a federal agency has published economic statistical data like this on the blockchain, and the latest way the Department is utilizing innovative technology to protect federal data and promote public use. 

The Department published an official hash of its quarterly GDP data release for 2025—and, in some cases, the topline GDP number—to the following nine blockchains: Bitcoin, Ethereum, Solana, TRON, Stellar, Avalanche, Arbitrum One, Polygon PoS, and Optimism.  The data was also further disseminated through coordination with the oracles, Pyth and Chainlink. The exchanges, Coinbase, Gemini, and Kraken, helped facilitate the Department’s publishing.  The Department will continue to innovate and broaden the scope of publishing future datasets like GDP to include the use of other blockchains, oracles, and exchanges.

Through this landmark effort, the Department hopes to demonstrate the wide utility of blockchain technology. It also aims to demonstrate a proof of concept for all of government, and to build on the Trump Administration’s historic efforts to make the United States of America the blockchain capital of the world…(More)”.

Department of Commerce Posts 2nd Quarter Gross Domestic Product to the Blockchain

Paper by Chijioke I Okorie and Melissa Omino: “This article examines the relationship between Standard Public Open Licences (SPOLs) and inequity in the artificial intelligence (AI) innovation ecosystem, focusing on how these licences affect access to and use of African datasets. While SPOLs are widely promoted as tools for democratising data access, they often apply uniform conditions to all users, disregarding disparities in infrastructure, capacity and socioeconomic context. As a result, SPOLs may unintentionally reinforce exclusion and enable extractive data practices that disadvantage communities contributing valuable datasets that they have preserved and curated through historically challenging conditions. The study employs a desktop literature review of primary and secondary sources, complemented by analysis of specific case studies from the Masakhane Research Collective in Natural Language Processing and qualitative vignettes based on real-world experiences to identify inherent and systemic limitations of current SPOLs. The research shows how existing SPOLs, particularly those founded on copyright law, fail to accommodate the positionality of African and similarly situated users in the global data economy. In response, the article introduces the Nwulite Obodo Open Data Licence (NOODL Licence), a novel, tiered SPOL designed to foster equitable openness. NOODL differentiates conditions of use based on users’ geography and development context, incorporating benefit-sharing obligations and context-sensitive terms. It maintains the simplicity and legal clarity of existing SPOLs while addressing their inequities. By critically analysing the overlooked relationship between SPOLs and inequity, this article contributes a practical, context-aware licensing alternative that centres communities. While grounded in the African experience, the NOODL framework offers a replicable model for promoting fairness and inclusivity in global data governance and AI innovation…(More)”.

Addressing Inequitable Openness in Licences for Sharing African Data and Datasets Through the Nwulite Obodo Open Data Licence

Paper by Iryna Susha et al: “To address complex societal challenges, governments increasingly need to make evidence-based decisions and require the best available data as input. As much of relevant data is now in the hands of the private sector, governments increasingly resort to purchasing data from private sources. There is, however, scant empirical evidence and a lack of understanding of how governments go about data purchasing. Therefore, we develop a new conceptual-analytical framework to analyze three models of data purchasing by governments: purchasing raw or aggregated data, data analyses, and data-based services. Next, based on Dutch data purchases, we explore the utility of our framework and create an evidence base detailing what data, data analyses, and data-based services Dutch governments purchase from whom, how, and for what purposes in the context of societal challenges. Our results map buyers and sellers of data in the Dutch context, as well as the types of data sold and in which policy domains. We expose a serious lack of transparency in government reporting on data purchasing. We further discuss our results in view of possible archetypes of data purchases and what purchasing strategy implications they have. Lastly, we propose several recommendations to practitioners and a research agenda for academics…(More)”.

Data for Sale: Uncovering public procurement of private sector data in The Netherlands

Paper by Caterina Santoro et al: “Open data fall short of their goal to empower all social groups equally. Although the literature examines this issue through the concept of inclusion, substantial gaps remain in defining and understanding the implications of open data for equity in public administration, with research on this topic scattered across disciplines. This fragmentation hinders the possibility of evaluating public policies. To address this gap, we ask: What is the state of the art (naming) on equity in relation to open data, particularly regarding the causes and effects of inequities (blaming) and the strategies to address them (claiming)? Our interdisciplinary review of 69 studies finds that open data serve as a valuable tool for detecting inequities. However, they also raise concerns related to data justice, as inequities in open data arise from epistemic injustice, commodification, capability gaps, financial constraints, and governance structures reinforcing power asymmetries. To address these issues, we suggest balancing data pluralism with standardization and shifting research data practices toward reflexivity. Other strategies focus on governance and encompass stewardship and the adoption of collective benefit models. Our findings provide researchers and public officials with a lens to critically understand open data as new technologies emerge and build upon them…(More)”.

Open Data and Equity: Naming the Issues, Blaming the Causes, and Claiming Solutions. An Exploratory and Interdisciplinary Systematic Literature Review

Interview by Emily Laber-Warren: “Police rely on tips from ordinary people — witnesses, victims and whistleblowers — to investigate 95 percent of crimes. Sometimes, the decision to speak up is easily made, but in other cases, people elect to stay silent, leaving countless infractions unpunished. About half of violent crimes go unreported, according to estimates by the US Department of Justice.

And yet at certain historical moments, such as in the United States in the early 1950s, when fear of communism led to many false reports against individuals working in entertainment and public service, societies can become places where people readily denounce one another — often falsely, or for petty reasons.

Tattling, whistleblowing, snitching, call it what you will: Patrick Bergemann has spent the past 15 years studying the many ways that people tell on one another, examining everything from Afghan villagers’ reports of illegal Taliban activity to informers’ charges of treason in 17th century Russia. In a recent article in the Annual Review of Sociology, he explores the social pressures that influence people’s decisions to expose, or conceal, wrongdoing. The choice to report reflects not just the infraction but a person’s loyalties and whether they expect to receive rewards or retaliation from authorities and peers, says Bergemann, a sociologist at the Paul Merage School of Business at the University of California, Irvine, and author of Judge Thy Neighbor: Denunciations in the Spanish Inquisition, Romanov Russia and Nazi Germany.

Bergemann talked with Knowable Magazine about why and when people report crimes and bad behavior, and how, for repressive governments, encouraging people to rat on neighbors and coworkers can be a potent form of social control…(More)”.

See something, say something? The science of speaking out

Paper by Sai Sanjna Chintakunta, Nathalia Nascimento, and Everton Guimaraes: “In recent years, Large Language Models (LLMs) have emerged as transformative tools across numerous domains, impacting how professionals approach complex analytical tasks. This systematic mapping study comprehensively examines the application of LLMs throughout the Data Science lifecycle. By analyzing relevant papers from Scopus and IEEE databases, we identify and categorize the types of LLMs being applied, the specific stages and tasks of the data science process they address, and the methodological approaches used for their evaluation. Our analysis includes a detailed examination of evaluation metrics employed across studies and systematically documents both positive contributions and limitations of LLMs when applied to data science workflows. This mapping provides researchers and practitioners with a structured understanding of the current landscape, highlighting trends, gaps, and opportunities for future research in this rapidly evolving intersection of LLMs and data science…(More)”.

Large Language Models in the Data Science Lifecycle: A Systematic Mapping Study

Get the latest news right in you inbox

Subscribe to curated findings and actionable knowledge from The Living Library, delivered to your inbox every Friday