data collaboratives

Commission facilitates data access for researchers under the Digital Services Act

Curated on July 2, 2025July 3, 2025 by Stefaan Verhulst

Press Release: “On 2 July 2025, the Commission published a delegated act outlining rules granting access to data for qualified researchers under the Digital Services Act (DSA). This delegated act enables access to the internal data of very large online platforms (VLOPs) and search engines (VLOSEs) to research the systemic risks and on the mitigation measures in the European Union.

The delegated act on data access clarifies the procedures for VLOPs and VLOSEs to share data with vetted researchers, including data formats and requirements for data documentation. Moreover, the delegated act sets out which information Digital Services Coordinators (DSCs), VLOPs and VLOSEs must make public to facilitate vetted researchers’ applications to access relevant datasets.

With the adoption of the delegated act, the Commission will launch the DSA data access portal where researchers interested in accessing data under the new mechanism can find information and exchange with VLOPs, VLOSEs and DSCs on their data access applications.

Before accessing internal data, researchers must be vetted by a DSC.

For this vetting process, researchers must submit a data access application demonstrating their affiliation to a research organisation, their independence from commercial interests, and their ability to manage the requested data in line with security, confidentiality and privacy rules. In addition, researchers need to disclose the funding of the research project for which the data is requested and commit to publishing the results of their research. Only data that is necessary to perform research on systemic risks in the EU can be requested.

To complement the rules in the delegated act, on 27 June 2025 the Board of Digital Services endorsed a proposal for further cooperation among DSCs in the vetting process of researchers…(More)”.

New data tools enhance the development effectiveness of tourism investment

Curated on May 12, 2025May 15, 2025 by Stefaan Verhulst

Article by Louise Twining-Ward, Alex Pio and Alba Suris Coll-Vinent: “The tourism sector is a major driver of economic growth and inclusive job creation. Tourism generates a high number of jobs, especially for women (UN Tourism). In 2024, tourism was responsible for one in ten jobs worldwide, delivering 337.7 million total jobs, and accounted for 10.5 percent of global GDP . For many developing countries, it is a primary generator of foreign exchange.

The growth of this vital sector depends heavily on public investment in infrastructure and services. But rapid change, due to uncertain geopolitics, climate shocks, and shifting consumer behavior, can make it hard to know how best to spend scarce resources. Traditional data sources are unable to keep up, leaving policymakers without the timely insights needed to effectively manage mounting complexities. Only a few developing coutries collect and maintain tourism satellite accounts (TSAs), which help capture tourism’s contribution to their economies. However, even in these countries, tourist arrival data and spending behavior, through immigration data and visitor surveys, are often processed with a lag. There is an urgent need for more accessible, more granular, and more timely data tools.

Emerging Data Tools

For this reason, the World Bank partnered with Visa to access anonymized and aggregated credit card spend data in the Caribbean and attempt to fill data gaps. This and other emerging tools for policymaking—such as satellite and geospatial mapping, analysis of online reviews, artificial intelligence, and advanced analytics—now allow tourism destinations to take a closer look at local demand patterns, gauge visitor satisfaction in near-real time, and measure progress on everything from carbon footprints to women’s employment in tourism…(More)”.

The European Data Cooperative (EDC)

Curated on May 12, 2025May 15, 2025 by Stefaan Verhulst

Invest Europe: “The European Data Cooperative (EDC) is a joint initiative developed by Invest Europe and its national association partners to collect Europe-wide industry data on activity (fundraising, investments, & divestments), economic impact (Employment, Turnover, EBITDA, & CAPEX) and ESG.

The EDC platform is jointly owned and operated by the private equity and venture capital associations of Europe. It serves as a single data entry point for their members and other contributors across the continent. The EDC brings together:

4,000 firms
10,900 funds
86,700 portfolio companies
330,900 transactions

Using one platform with a standardised methodology allows us to have consistent, robust pan-European statistics that are comparable across the region…(More)”

Balancing Data Sharing and Privacy to Enhance Integrity and Trust in Government Programs

Curated on May 6, 2025May 6, 2025 by Stefaan Verhulst

Paper by National Academy of Public Administration: “Improper payments and fraud cost the federal government hundreds of billions of dollars each year, wasting taxpayer money and eroding public trust. At the same time, agencies are increasingly expected to do more with less. Finding better ways to share data, without compromising privacy, is critical for ensuring program integrity in a resource-constrained environment.

Key Takeaways

Data sharing strengthens program integrity and fraud prevention. Agencies and oversight bodies like GAO and OIGs have uncovered large-scale fraud by using shared data.
Opportunities exist to streamline and expedite the compliance processes required by privacy laws and reduce systemic barriers to sharing data across federal agencies.
Targeted reforms can address these barriers while protecting privacy:
1. OMB could issue guidance to authorize fraud prevention as a routine use in System of Records Notices.
2. Congress could enact special authorities or exemptions for data sharing that supports program integrity and fraud prevention.
3. A centralized data platform could help to drive cultural change and support secure, responsible data sharing…(More)”

Data Commons: The Missing Infrastructure for Public Interest Artificial Intelligence

Curated on May 1, 2025May 1, 2025 by Stefaan Verhulst

Article by Stefaan Verhulst, Burton Davis and Andrew Schroeder: “Artificial intelligence is celebrated as the defining technology of our time. From ChatGPT to Copilot and beyond, generative AI systems are reshaping how we work, learn, and govern. But behind the headline-grabbing breakthroughs lies a fundamental problem: The data these systems depend on to produce useful results that serve the public interest is increasingly out of reach.

Without access to diverse, high-quality datasets, AI models risk reinforcing bias, deepening inequality, and returning less accurate, more imprecise results. Yet, access to data remains fragmented, siloed, and increasingly enclosed. What was once open—government records, scientific research, public media—is now locked away by proprietary terms, outdated policies, or simple neglect. We are entering a data winter just as AI’s influence over public life is heating up.

This isn’t just a technical glitch. It’s a structural failure. What we urgently need is new infrastructure: data commons.

A data commons is a shared pool of data resources—responsibly governed, managed using participatory approaches, and made available for reuse in the public interest. Done correctly, commons can ensure that communities and other networks have a say in how their data is used, that public interest organizations can access the data they need, and that the benefits of AI can be applied to meet societal challenges.

Commons offer a practical response to the paradox of data scarcity amid abundance. By pooling datasets across organizations—governments, universities, libraries, and more—they match data supply with real-world demand, making it easier to build AI that responds to public needs.

We’re already seeing early signs of what this future might look like. Projects like Common Corpus, MLCommons, and Harvard’s Institutional Data Initiative show how diverse institutions can collaborate to make data both accessible and accountable. These initiatives emphasize open standards, participatory governance, and responsible reuse. They challenge the idea that data must be either locked up or left unprotected, offering a third way rooted in shared value and public purpose.

But the pace of progress isn’t matching the urgency of the moment. While policymakers debate AI regulation, they often ignore the infrastructure that makes public interest applications possible in the first place. Without better access to high-quality, responsibly governed data, AI for the common good will remain more aspiration than reality.

That’s why we’re launching The New Commons Challenge—a call to action for universities, libraries, civil society, and technologists to build data ecosystems that fuel public-interest AI…(More)”.

Real-time prices, real results: comparing crowdsourcing, AI, and traditional data collection

Curated on May 1, 2025May 1, 2025 by Stefaan Verhulst

Article by Julius Adewopo, Bo Andree, Zacharey Carmichael, Steve Penson, Kamwoo Lee: “Timely, high-quality food price data is essential for shock responsive decision-making. However, in many low- and middle-income countries, such data is often delayed, limited in geographic coverage, or unavailable due to operational constraints. Traditional price monitoring, which relies on structured surveys conducted by trained enumerators, is often constrained by challenges related to cost, frequency, and reach.

To help overcome these limitations, the World Bank launched the Real-Time Prices (RTP) data platform. This effort provides monthly price data using a machine learning framework. The models combine survey results with predictions derived from observations in nearby markets and related commodities. This approach helps fill gaps in local price data across a basket of goods, enabling real-time monitoring of inflation dynamics even when survey data is incomplete or irregular.

In parallel, new approaches—such as citizen-submitted (crowdsourced) data—are being explored to complement conventional data collection methods. These crowdsourced data were recently published in a Nature Scientific Data paper. While the adoption of these innovations is accelerating, maintaining trust requires rigorous validation.

A newly published study in PLOS compares the two emerging methods with the traditional, enumerator-led gold standard, providing new evidence that both crowdsourced and AI-imputed prices can serve as credible, timely alternatives to traditional ground-truth data collection—especially in contexts where conventional methods face limitations…(More)”.

Data Cooperatives: Democratic Models for Ethical Data Stewardship

Curated on April 16, 2025April 16, 2025 by Stefaan Verhulst

Paper by Francisco Mendonca, Giovanna DiMarzo, and Nabil Abdennadher: “Data cooperatives offer a new model for fair data governance, enabling individuals to collectively control, manage, and benefit from their information while adhering to cooperative principles such as democratic member control, economic participation, and community concern. This paper reviews data cooperatives, distinguishing them from models like data trusts, data commons, and data unions, and defines them based on member ownership, democratic governance, and data sovereignty. It explores applications in sectors like healthcare, agriculture, and construction. Despite their potential, data cooperatives face challenges in coordination, scalability, and member engagement, requiring innovative governance strategies, robust technical systems, and mechanisms to align member interests with cooperative goals. The paper concludes by advocating for data cooperatives as a sustainable, democratic, and ethical model for the future data economy…(More)”.

The Future of Health Is Preventive — If We Get Data Governance Right

Curated on April 10, 2025April 10, 2025 by Stefaan Verhulst

Article by Stefaan Verhulst: “After a long gestation period of three years, the European Health Data Space (EHDS) is now coming into effect across the European Union, potentially ushering in a new era of health data access, interoperability, and innovation. As this ambitious initiative enters the implementation phase, it brings with it the opportunity to fundamentally reshape how health systems across Europe operate. More generally, the EHDS contains important lessons (and some cautions) for the rest of the world, suggesting how a fragmented, reactive model of healthcare may transition to one that is more integrated, proactive, and prevention-oriented.

For too long, health systems–in the EU and around the world–have been built around treating diseases rather than preventing them. Now, we have an opportunity to change that paradigm. Data, and especially the advent of AI, give us the tools to predict and intervene before illness takes hold. Data offers the potential for a system that prioritizes prevention–one where individuals receive personalized guidance to stay healthy, policymakers access real-time evidence to address risks before they escalate, and epidemics are predicted weeks in advance, enabling proactive, rapid, and highly effective responses.

But to make AI-powered preventive health care a reality, and to make the EHDS a success, we need a new data governance approach, one that would include two key components:

The ability to reuse data collected for other purposes (e.g., mobility, retail sales, workplace trends) to improve health outcomes.
The ability to integrate different data sources–clinical records and electronic health records (EHRS), but also environmental, social, and economic data — to build a complete picture of health risks.

In what follows, we outline some critical aspects of this new governance framework, including responsible data access and reuse (so-called secondary use), moving beyond traditional consent models to a social license for reuse, data stewardship, and the need to prioritize high-impact applications. We conclude with some specific recommendations for the EHDS, built from the preceding general discussion about the role of AI and data in preventive health…(More)”.

Unlocking Public Value with Non-Traditional Data: Recent Use Cases and Emerging Trends

Curated on April 10, 2025April 10, 2025 by Stefaan Verhulst

Article by Adam Zable and Stefaan Verhulst: “Non-Traditional Data (NTD)—digitally captured, mediated, or observed data such as mobile phone records, online transactions, or satellite imagery—is reshaping how we identify, understand, and respond to public interest challenges. As part of the Third Wave of Open Data, these often privately held datasets are being responsibly re-used through new governance models and cross-sector collaboration to generate public value at scale.

In our previous post, we shared emerging case studies across health, urban planning, the environment, and more. Several months later, the momentum has not only continued but diversified. New projects reaffirm NTD’s potential—especially when linked with traditional data, embedded in interdisciplinary research, and deployed in ways that are privacy-aware and impact-focused.

This update profiles recent initiatives that push the boundaries of what NTD can do. Together, they highlight the evolving domains where this type of data is helping to surface hidden inequities, improve decision-making, and build more responsive systems:

Financial Inclusion
Public Health and Well-Being
Socioeconomic Analysis
Transportation and Urban Mobility
Data Systems and Governance
Economic and Labor Dynamics
Digital Behavior and Communication…(More)”.

Exploring Human Mobility in Urban Nightlife: Insights from Foursquare Data

Curated on April 7, 2025April 10, 2025 by Stefaan Verhulst

Article by Ehsan Dorostkar: “In today’s digital age, social media platforms like Foursquare provide a wealth of data that can reveal fascinating insights into human behavior, especially in urban environments. Our recent study, published in Cities, delves into how virtual mobility on Foursquare translates into actual human mobility in Tehran’s nightlife scenes. By analyzing user-generated data, we uncovered patterns that can help urban planners create more vibrant and functional nightlife spaces…

Our study aimed to answer two key questions:

How does virtual mobility on Foursquare influence real-world human mobility in urban nightlife?
What spatial patterns emerge from these movements, and how can they inform urban planning?

To explore these questions, we focused on two bustling nightlife spots in Tehran—Region 1 (Darband Square) and Region 6 (Valiasr crossroads)—where Foursquare data indicated high user activity.

Methodology

We combined data from two sources:

Foursquare API: To track user check-ins and identify popular nightlife venues.
Tehran Municipality API: To contextualize the data within the city’s urban framework.

Using triangulation and interpolation techniques, we mapped the “human mobility triangles” in these areas, calculating the density and spread of user activity…(More)”.