open data

Facilitating the secondary use of health data for public interest purposes across borders

Curated on June 11, 2025June 11, 2025 by Stefaan Verhulst

OECD Paper: “Recent technological developments create significant opportunities to process health data in the public interest. However, the growing fragmentation of frameworks applied to data has become a structural impediment to fully leverage these opportunities. Public and private stakeholders suggest that three key areas should be analysed to support this outcome, namely: the convergence of governance frameworks applicable to health data use in the public interest across jurisdictions; the harmonisation of national procedures applicable to secondary health data use; and the public perceptions around the use of health data. This paper explores each of these three key areas and concludes with an overview of collective findings relating specifically to the convergence of legal bases for secondary data use…(More)”.

Unequal Journeys to Food Markets: Continental-Scale Evidence from Open Data in Africa

Curated on June 8, 2025June 8, 2025 by Stefaan Verhulst

Paper by Robert Benassai-Dalmau, et al: “Food market accessibility is a critical yet underexplored dimension of food systems, particularly in low- and middle-income countries. Here, we present a continent-wide assessment of spatial food market accessibility in Africa, integrating open geospatial data from OpenStreetMap and the World Food Programme. We compare three complementary metrics: travel time to the nearest market, market availability within a 30-minute threshold, and an entropy-based measure of spatial distribution, to quantify accessibility across diverse settings. Our analysis reveals pronounced disparities: rural and economically disadvantaged populations face substantially higher travel times, limited market reach, and less spatial redundancy. These accessibility patterns align with socioeconomic stratification, as measured by the Relative Wealth Index, and moderately correlate with food insecurity levels, assessed using the Integrated Food Security Phase Classification. Overall, results suggest that access to food markets plays a relevant role in shaping food security outcomes and reflects broader geographic and economic inequalities. This framework provides a scalable, data-driven approach for identifying underserved regions and supporting equitable infrastructure planning and policy design across diverse African contexts…(More)”.

The Global Data Barometer 2nd edition: A Shared Compass for Navigating the Data Landscape

Curated on May 29, 2025May 29, 2025 by Stefaan Verhulst

Report by the Global Data Barometer: “Across the globe, we’re at a turning point. From artificial intelligence and digital governance to public transparency and service delivery, data is now a fundamental force shaping how our societies function and who they serve. It holds tremendous promise to drive inclusive growth, foster accountability, and support urgent action on global challenges. And yet, access to high-quality, usable data is becoming increasingly constrained.

Some, like Verhulst (2024), have begun calling this moment a “data winter,” a period marked by shrinking openness, rising inequality in access, and growing fragmentation in how data is governed and used. This trend poses a risk not just to innovation but to the democratic values that underpin trust, participation, and accountability.

In this complex landscape, evidence matters more than ever. That is why we are proud to launch the Second Edition of the Global Data Barometer (GDB), a collaborative and comparative study that tracks the state of data for the public good across 43 countries, with a focused lens on Latin America and the Caribbean (LAC) and Africa…

The Barometer tracks countries across four dimensions: governance, capabilities, and availability, while also exploring key cross-cutting areas like AI readiness, inclusion, and data use. Here are some of the key takeaways:

The Implementation Gap

Many countries have adopted laws and frameworks for data governance, but there is a stark gap between policy and practice. Without strong institutions and dedicated capacity, even well-designed frameworks fall short.

The Role of Skills and Infrastructure

Data does not flow or translate into value without people and systems in place. Across both Latin America and the Caribbean and Africa, we see underinvestment in public sector skills, training, and the infrastructure needed to manage and reuse data effectively.

AI Is Moving Faster Than Governance

AI is increasingly present in national strategies, but very few countries have clear policies to guide its ethical use. Governance frameworks rarely address issues like algorithmic bias, data quality, or the accountability of AI-driven decision-making.

Open Data Needs Reinvestment

Many countries once seen as open data champions are struggling to sustain their efforts. Legal mandates are not always matched by technical implementation or resources. As a result, open data initiatives risk losing momentum.

Transparency Tools Are Missing

Key datasets that support transparency and anti-corruption, such as lobbying registers, beneficial ownership data, and political finance records, are often missing or fragmented. This makes it hard to follow the money or hold institutions to account.

Inclusion Is Still Largely Symbolic

Despite commitments to equity, inclusive data governance remains the exception. Data is rarely published in Indigenous or widely spoken non-official languages. Accessibility for persons with disabilities is often treated as a recommendation rather than a requirement.

Interoperability Remains a Barrier

Efforts to connect datasets across government, such as on procurement, company data, or political integrity, are rare. Without common standards or identifiers, it is difficult to track influence or evaluate policy impact holistically…(More)”.

Data Commons: The Missing Infrastructure for Public Interest Artificial Intelligence

Curated on May 1, 2025May 1, 2025 by Stefaan Verhulst

Article by Stefaan Verhulst, Burton Davis and Andrew Schroeder: “Artificial intelligence is celebrated as the defining technology of our time. From ChatGPT to Copilot and beyond, generative AI systems are reshaping how we work, learn, and govern. But behind the headline-grabbing breakthroughs lies a fundamental problem: The data these systems depend on to produce useful results that serve the public interest is increasingly out of reach.

Without access to diverse, high-quality datasets, AI models risk reinforcing bias, deepening inequality, and returning less accurate, more imprecise results. Yet, access to data remains fragmented, siloed, and increasingly enclosed. What was once open—government records, scientific research, public media—is now locked away by proprietary terms, outdated policies, or simple neglect. We are entering a data winter just as AI’s influence over public life is heating up.

This isn’t just a technical glitch. It’s a structural failure. What we urgently need is new infrastructure: data commons.

A data commons is a shared pool of data resources—responsibly governed, managed using participatory approaches, and made available for reuse in the public interest. Done correctly, commons can ensure that communities and other networks have a say in how their data is used, that public interest organizations can access the data they need, and that the benefits of AI can be applied to meet societal challenges.

Commons offer a practical response to the paradox of data scarcity amid abundance. By pooling datasets across organizations—governments, universities, libraries, and more—they match data supply with real-world demand, making it easier to build AI that responds to public needs.

We’re already seeing early signs of what this future might look like. Projects like Common Corpus, MLCommons, and Harvard’s Institutional Data Initiative show how diverse institutions can collaborate to make data both accessible and accountable. These initiatives emphasize open standards, participatory governance, and responsible reuse. They challenge the idea that data must be either locked up or left unprotected, offering a third way rooted in shared value and public purpose.

But the pace of progress isn’t matching the urgency of the moment. While policymakers debate AI regulation, they often ignore the infrastructure that makes public interest applications possible in the first place. Without better access to high-quality, responsibly governed data, AI for the common good will remain more aspiration than reality.

That’s why we’re launching The New Commons Challenge—a call to action for universities, libraries, civil society, and technologists to build data ecosystems that fuel public-interest AI…(More)”.

Open with care: transparency and data sharing in civically engaged research

Curated on April 21, 2025April 21, 2025 by Stefaan Verhulst

Paper by Ankushi Mitra: “Research transparency and data access are considered increasingly important for advancing research credibility, cumulative learning, and discovery. However, debates persist about how to define and achieve these goals across diverse forms of inquiry. This article intervenes in these debates, arguing that the participants and communities with whom scholars work are active stakeholders in science, and thus have a range of rights, interests, and researcher obligations to them in the practice of transparency and openness. Drawing on civically engaged research and related approaches that advocate for subjects of inquiry to more actively shape its process and share in its benefits, I outline a broader vision of research openness not only as a matter of peer scrutiny among scholars or a top-down exercise in compliance, but rather as a space for engaging and maximizing opportunities for all stakeholders in research. Accordingly, this article provides an ethical and practical framework for broadening transparency, accessibility, and data-sharing and benefit-sharing in research. It promotes movement beyond open science to a more inclusive and socially responsive science anchored in a larger ethical commitment: that the pursuit of knowledge be accountable and its benefits made accessible to the citizens and communities who make it possible…(More)”.

Fostering Open Data

Curated on April 10, 2025April 10, 2025 by Stefaan Verhulst

Paper by Uri Y. Hacohen: “Data is often heralded as “the world’s most valuable resource,” yet its potential to benefit society remains unrealized due to systemic barriers in both public and private sectors. While open data-defined as data that is available, accessible, and usable-holds immense promise to advance open science, innovation, economic growth, and democratic values, its utilization is hindered by legal, technical, and organizational challenges. Public sector initiatives, such as U.S. and European Union open data regulations, face uneven enforcement and regulatory complexity, disproportionately affecting under-resourced stakeholders such as researchers. In the private sector, companies prioritize commercial interests and user privacy, often obstructing data openness through restrictive policies and technological barriers. This article proposes an innovative, four-layered policy framework to overcome these obstacles and foster data openness. The framework includes (1) improving open data infrastructures, (2) ensuring legal frameworks for open data, (3) incentivizing voluntary data sharing, and (4) imposing mandatory data sharing obligations. Each policy cluster is tailored to address sector-specific challenges and balance competing values such as privacy, property, and national security. Drawing from academic research and international case studies, the framework provides actionable solutions to transition from a siloed, proprietary data ecosystem to one that maximizes societal value. This comprehensive approach aims to reimagine data governance and unlock the transformative potential of open data…(More)”.

Enabling an Open-Source AI Ecosystem as a Building Block for Public AI

Curated on April 4, 2025April 4, 2025 by Stefaan Verhulst

Policy brief by Katarzyna Odrozek, Vidisha Mishra, Anshul Pachouri, Arnav Nigam: “…informed by insights from 30 open dataset builders convened by Mozilla and EleutherAI and a policy analysis on open-source Artificial intelligence (AI) development, outlines four key areas for G7 action: expand access to open data, support sustainable governance, encourage policy alignment in open-source AI and local capacity building and identification of use cases. These steps will enhance AI competitiveness, accountability, and innovation, positioning the G7 as a leader in Responsible AI development…(More)”.

Researching data discomfort: The case of Statistics Norway’s quest for billing data

Curated on March 30, 2025March 30, 2025 by Stefaan Verhulst

Paper by Lisa Reutter: “National statistics offices are increasingly exploring the possibilities of utilizing new data sources to position themselves in emerging data markets. In 2022, Statistics Norway announced that the national agency will require the biggest grocers in Norway to hand over all collected billing data to produce consumer behavior statistics which had previously been produced by other sampling methods. An online article discussing this proposal sparked a surprisingly (at least to Statistics Norway) high level of interest among readers, many of whom expressed concerns about this intended change in data practice. This paper focuses on the multifaceted online discussions of the proposal, as these enable us to study citizens’ reactions and feelings towards increased data collection and emerging public-private data flows in a Nordic context. Through an explorative empirical analysis of comment sections, this paper investigates what is discussed by commenters and reflects upon why this case sparked so much interest among citizens in the first place. It therefore contributes to the growing literature of citizens’ voices in data-driven administration and to a wider discussion on how to research public feeling towards datafication. I argue that this presents an interesting case of discomfort voiced by citizens, which demonstrates the contested nature of data practices among citizens–and their ability to regard data as deeply intertwined with power and politics. This case also reminds researchers to pay attention to seemingly benign and small changes in administration beyond artificial intelligence…(More)”

Legal frictions for data openness

Curated on March 29, 2025March 29, 2025 by Stefaan Verhulst

Paper by Ramya Chandrasekhar: “investigates legal entanglements of re-use, when data and content from the open web is used to train foundation AI models. Based on conversations with AI researchers and practitioners, an online workshop, and legal analysis of a repository of 41 legal disputes relating to copyright and data protection, this report highlights tensions between legal imaginations of data flows and computational processes involved in training foundation models.

To realise the promise of the open web as open for all, this report argues that efforts oriented solely towards techno-legal openness of training datasets are not enough. Techno-legal openness of datasets facilitates easy re-use of data. But, certain well-resourced actors like Big Tech are able to take advantage of data flows on the open web to internet to train proprietary foundation models, while giving little to no value back to either the maintenance of shared informational resources or communities of commoners. At the same time, open licenses no longer accommodate changing community preferences of sharing and re-use of data and content.
In addition to techno-legal openness of training datasets, there is a need for certain limits on the extractive power of well-resourced actors like BigTech combined with increased recognition of community data sovereignty. Alternative licensing frameworks, such as the Nwulite Obodo License, Kaitiakitanga Licenses, the Montreal License, the OpenRAIL Licenses, the Open Data Commons License, and the AI2Impact Licenses hold valuable insights in this regard. While these licensing frameworks impose more obligations on re-users and necessitate more collective thinking on interoperability,they are nonetheless necessary for the creation of healthy digital and data commons, to realise the original promise of the open web as open for all…(More)”.

What is a fair exchange for access to public data?

Curated on March 18, 2025March 18, 2025 by Stefaan Verhulst

Blog and policy brief by Jeni Tennison: “The most obvious approach to get companies to share value back to the public sector in return for access to data is to charge them. However, there are a number of challenges with a “pay to access” approach: it’s hard to set the right price; it creates access barriers, particularly for cash-poor start-ups; and it creates a public perception that the government is willing to sell their data, and might be tempted to loosen privacy-protecting governance controls in exchange for cash.

Are there other options? The policy brief explores a range of other approaches and assesses these against five goals that a value-sharing framework should ideally meet, to:

Encourage use of public data, including by being easy for organisations to understand and administer.
Provide a return on investment for the public sector, offsetting at least some of the costs of supporting the NDL infrastructure and minimising administrative costs.
Promote equitable innovation and economic growth in the UK, which might mean particularly encouraging smaller, home-grown businesses.
Create social value, particularly towards this Government’s other missions, such as achieving Net Zero or unlocking opportunity for all.
Build public trust by being easily explainable, avoiding misaligned incentives that encourage the breaking of governance guardrails, and feeling like a fair exchange.

In brief, alternatives to a pay-to-access model that still provide direct financial returns include:

Discounts: the public sector could secure discounts on products and services created using public data. However, this could be difficult to administer and enforce.
Royalties: taking a percentage of charges for products and services created using public data might be similarly hard to administer and enforce, but applies to more companies.
Equity: taking equity in startups can provide long-term returns and align with public investment goals.
Levies: targeted taxes on businesses that use public data can provide predictable revenue and encourage data use.
General taxation: general taxation can fund data infrastructure, but it may lack the targeted approach and public visibility of other methods.

It’s also useful to consider non-financial conditions that could be put on organisations accessing public data..(More)”.

Topics: open data

Facilitating the secondary use of health data for public interest purposes across borders

Unequal Journeys to Food Markets: Continental-Scale Evidence from Open Data in Africa

The Global Data Barometer 2nd edition: A Shared Compass for Navigating the Data Landscape

Data Commons: The Missing Infrastructure for Public Interest Artificial Intelligence

Open with care: transparency and data sharing in civically engaged research

Fostering Open Data

Enabling an Open-Source AI Ecosystem as a Building Block for Public AI

Researching data discomfort: The case of Statistics Norway’s quest for billing data

Legal frictions for data openness

What is a fair exchange for access to public data?

79 countries

Streetlight Effect

(stritlaɪt ɪˈfɛkt)

Subscribe to curated findings and actionable knowledge
from The Living Library, delivered to your inbox every Friday

Blockchain and Identity