Researchers’ access to information from regulated online services


Report by Ofcom (UK): “…We outline three potential policy options and models for facilitating greater researcher access, which include:

  1. Clarify existing legal rules: Relevant authorities, could provide additional guidance on what is already legally permitted for researcher access on important issues, such as data donations and research-related scraping.
  2. Create new duties, enforced by a backstop regulator: Services could be required to put in place systems and processes to operationalise data access. This could include new duties on regulated services to create standard procedures for researcher accreditation. Services would be responsible for providing researchers with data directly or providing the interface through which they can access it and offering appeal and redress mechanisms. A backstop regulator could enforce these duties – either an existing or new body. 
  3. Enable and manage access via independent intermediary: New legal powers could be granted to a trusted third party which would facilitate and manage researchers’ access to data. This intermediary – which could again be an existing or new body – would accredit researchers and provide secure access.

Our report describes three types of intermediary that could be considered – direct access intermediary, notice to service intermediary and repository intermediary models.

  • Direct access intermediary. Researchers could request data with an intermediary facilitating secure access. In this model, services could retain responsibility for hosting and providing data while the intermediary maintains the interface by which researchers request access.
  • Notice to service intermediary. Researchers could apply for accreditation and request access to specific datasets via the intermediary. This could include data that would not be accessible in direct access models. The intermediary would review and refuse or approve access. Services would then be required to provide access to the approved data.
  • Repository intermediary. The intermediary could itself provide direct access to data, by providing an interface for data access and/or hosting the data itself and taking responsibility for data governance. This could also include data that would not be accessible in direct access models…(More)”.

The Smart City as a Field of Innovation: Effects of Public-Private Data Collaboration on the Innovation Performance of Small and Medium-Sized Enterprises in China


Paper by xiaohui jiang and Masaru Yarime: “The Chinese government has been playing an important role in stimulating innovation among Chinese enterprises. Small and medium-sized enterprises (SMEs), with their limited internal resources, particularly face a severe challenge in implementing innovation activities that depend upon data, funding sources, and talents. However, the rapidly developing smart city projects in China, where significant amounts of data are available from various sophisticated devices and generous funding opportunities, are providing rich opportunities for SMEs to explore data-driven innovation. Chinese Governments are trying to actively engage SMEs in the process of smart city construction. When cooperating with the government, the availability of and access to data involved in the government contracts and the ability required in the project help SMEs to train and improve their innovation ability.In this article, we intend to address how obtaining different types of government contracts (equipment supply, platform building, data analysis) can influence firms’ performance on innovation. Obtaining different types of government contracts are regarded as receiving different types of treatments. The hypothesis is that the data analysis type of contracts has a larger positive influence on improving the innovation ability compared to the platform building type, while the platform building type of contracts can have a larger influence compared to equipment supply. Focusing on the case of SMEs in China, this research aims to shed light on how the government and enterprises collaborate in smart city projects to facilitate innovation. Data on companies’ registered capital, industry, and software products from 1990– 2020 is compiled from the Tianyancha website. A panel dataset is established with the key characteristics of the SMEs, software productions, and their record on government contracts. Based on the company’s basic characteristics, we divided six pairs of treatment and control groups using propensity score matching (PSM) and then ran a validity test to confirm that the result of the division was reliable. Then based on the established control and treatment pairs, we run a difference-in-difference (DID) model, and the result supports our original hypothesis. The statistics shows mixed result, Hypothesis 1 which indicates that companies obtaining data analysis contracts will experience greater innovation improvements compared to those with platform-building contracts, is partially confirmed when using software copyright as an outcome variable. However, when using patent data as an indicator, the statistics is insignificant. Hypothesis 2, which posits that companies with platform-building contracts will show greater innovation improvements than those with equipment supply contracts, is not supported. Hypothesis 3 which suggests that companies receiving government contracts will have higher innovation outputs than those without, is confirmed. The case studies later have revealed the complex mechanisms behind the scenario…(More)”.

Commission facilitates data access for researchers under the Digital Services Act


Press Release: “On 2 July 2025, the Commission published a delegated act outlining rules granting access to data for qualified researchers under the Digital Services Act (DSA). This delegated act enables access to the internal data of very large online platforms (VLOPs) and search engines (VLOSEs) to research the systemic risks and on the mitigation measures in the European Union.

The delegated act on data access clarifies the procedures for VLOPs and VLOSEs to share data with vetted researchers, including data formats and requirements for data documentation. Moreover, the delegated act sets out which information Digital Services Coordinators (DSCs), VLOPs and VLOSEs must make public to facilitate vetted researchers’ applications to access relevant datasets.

With the adoption of the delegated act, the Commission will launch the DSA data access portal where researchers interested in accessing data under the new mechanism can find information and exchange with VLOPs, VLOSEs and DSCs on their data access applications. 

Before accessing internal data, researchers must be vetted by a DSC

For this vetting process, researchers must submit a data access application demonstrating their affiliation to a research organisation, their independence from commercial interests, and their ability to manage the requested data in line with security, confidentiality and privacy rules. In addition, researchers need to disclose the funding of the research project for which the data is requested and commit to publishing the results of their research. Only data that is necessary to perform research on systemic risks in the EU can be requested.

To complement the rules in the delegated act, on 27 June 2025 the Board of Digital Services endorsed a proposal for further cooperation among DSCs in the vetting process of researchers…(More)”.

New data tools enhance the development effectiveness of tourism investment


Article by Louise Twining-Ward, Alex Pio and Alba Suris Coll-Vinent: “The tourism sector is a major driver of economic growth and inclusive job creation. Tourism generates a high number of jobs, especially for women (UN Tourism). In 2024, tourism was responsible for one in ten jobs worldwide, delivering 337.7 million total jobs, and accounted for 10.5 percent of global GDP . For many developing countries, it is a primary generator of foreign exchange.

The growth of this vital sector depends heavily on public investment in infrastructure and services. But rapid change, due to uncertain geopolitics, climate shocks, and shifting consumer behavior, can make it hard to know how best to spend scarce resources. Traditional data sources are unable to keep up, leaving policymakers without the timely insights needed to effectively manage mounting complexities. Only a few developing coutries collect and maintain tourism satellite accounts (TSAs), which help capture tourism’s contribution to their economies. However, even in these countries, tourist arrival data and spending behavior, through immigration data and visitor surveys, are often processed with a lag. There is an urgent need for more accessible, more granular, and more timely data tools.

Emerging Data Tools

For this reason, the World Bank partnered with Visa to access anonymized and aggregated credit card spend data in the Caribbean and attempt to fill data gaps. This and other emerging tools for policymaking—such as satellite and geospatial mapping, analysis of online reviews, artificial intelligence, and advanced analytics—now allow tourism destinations to take a closer look at local demand patterns, gauge visitor satisfaction in near-real time, and measure progress on everything from carbon footprints to women’s employment in tourism…(More)”.

The European Data Cooperative (EDC) 


Invest Europe: “The European Data Cooperative (EDC) is a joint initiative developed by Invest Europe and its national association partners to collect Europe-wide industry data on activity (fundraising, investments, & divestments), economic impact (Employment, Turnover, EBITDA, & CAPEX) and ESG.

The EDC platform is jointly owned and operated by the private equity and venture capital associations of Europe. It serves as a single data entry point for their members and other contributors across the continent. The EDC brings together:

  • 4,000 firms
  • 10,900 funds
  • 86,700 portfolio companies
  • 330,900 transactions

Using one platform with a standardised methodology allows us to have consistent, robust pan-European statistics that are comparable across the region…(More)”

Balancing Data Sharing and Privacy to Enhance Integrity and Trust in Government Programs


Paper by National Academy of Public Administration: “Improper payments and fraud cost the federal government hundreds of billions of dollars each year, wasting taxpayer money and eroding public trust. At the same time, agencies are increasingly expected to do more with less. Finding better ways to share data, without compromising privacy, is critical for ensuring program integrity in a resource-constrained environment.

Key Takeaways

  • Data sharing strengthens program integrity and fraud prevention. Agencies and oversight bodies like GAO and OIGs have uncovered large-scale fraud by using shared data.
  • Opportunities exist to streamline and expedite the compliance processes required by privacy laws and reduce systemic barriers to sharing data across federal agencies.
  • Targeted reforms can address these barriers while protecting privacy:
    1. OMB could issue guidance to authorize fraud prevention as a routine use in System of Records Notices.
    2. Congress could enact special authorities or exemptions for data sharing that supports program integrity and fraud prevention.
    3. A centralized data platform could help to drive cultural change and support secure, responsible data sharing…(More)”

Data Commons: The Missing Infrastructure for Public Interest Artificial Intelligence


Article by Stefaan Verhulst, Burton Davis and Andrew Schroeder: “Artificial intelligence is celebrated as the defining technology of our time. From ChatGPT to Copilot and beyond, generative AI systems are reshaping how we work, learn, and govern. But behind the headline-grabbing breakthroughs lies a fundamental problem: The data these systems depend on to produce useful results that serve the public interest is increasingly out of reach.

Without access to diverse, high-quality datasets, AI models risk reinforcing bias, deepening inequality, and returning less accurate, more imprecise results. Yet, access to data remains fragmented, siloed, and increasingly enclosed. What was once open—government records, scientific research, public media—is now locked away by proprietary terms, outdated policies, or simple neglect. We are entering a data winter just as AI’s influence over public life is heating up.

This isn’t just a technical glitch. It’s a structural failure. What we urgently need is new infrastructure: data commons.

A data commons is a shared pool of data resources—responsibly governed, managed using participatory approaches, and made available for reuse in the public interest. Done correctly, commons can ensure that communities and other networks have a say in how their data is used, that public interest organizations can access the data they need, and that the benefits of AI can be applied to meet societal challenges.

Commons offer a practical response to the paradox of data scarcity amid abundance. By pooling datasets across organizations—governments, universities, libraries, and more—they match data supply with real-world demand, making it easier to build AI that responds to public needs.

We’re already seeing early signs of what this future might look like. Projects like Common Corpus, MLCommons, and Harvard’s Institutional Data Initiative show how diverse institutions can collaborate to make data both accessible and accountable. These initiatives emphasize open standards, participatory governance, and responsible reuse. They challenge the idea that data must be either locked up or left unprotected, offering a third way rooted in shared value and public purpose.

But the pace of progress isn’t matching the urgency of the moment. While policymakers debate AI regulation, they often ignore the infrastructure that makes public interest applications possible in the first place. Without better access to high-quality, responsibly governed data, AI for the common good will remain more aspiration than reality.

That’s why we’re launching The New Commons Challenge—a call to action for universities, libraries, civil society, and technologists to build data ecosystems that fuel public-interest AI…(More)”.

Real-time prices, real results: comparing crowdsourcing, AI, and traditional data collection


Article by Julius Adewopo, Bo Andree, Zacharey Carmichael, Steve Penson, Kamwoo Lee: “Timely, high-quality food price data is essential for shock responsive decision-making. However, in many low- and middle-income countries, such data is often delayed, limited in geographic coverage, or unavailable due to operational constraints. Traditional price monitoring, which relies on structured surveys conducted by trained enumerators, is often constrained by challenges related to cost, frequency, and reach.

To help overcome these limitations, the World Bank launched the Real-Time Prices (RTP) data platform. This effort provides monthly price data using a machine learning framework. The models combine survey results with predictions derived from observations in nearby markets and related commodities. This approach helps fill gaps in local price data across a basket of goods, enabling real-time monitoring of inflation dynamics even when survey data is incomplete or irregular.

In parallel, new approaches—such as citizen-submitted (crowdsourced) data—are being explored to complement conventional data collection methods. These crowdsourced data were recently published in a Nature Scientific Data paper. While the adoption of these innovations is accelerating, maintaining trust requires rigorous validation.

newly published study in PLOS compares the two emerging methods with the traditional, enumerator-led gold standard, providing  new evidence that both crowdsourced and AI-imputed prices can serve as credible, timely alternatives to traditional ground-truth data collection—especially in contexts where conventional methods face limitations…(More)”.

Data Cooperatives: Democratic Models for Ethical Data Stewardship


Paper by Francisco Mendonca, Giovanna DiMarzo, and Nabil Abdennadher: “Data cooperatives offer a new model for fair data governance, enabling individuals to collectively control, manage, and benefit from their information while adhering to cooperative principles such as democratic member control, economic participation, and community concern. This paper reviews data cooperatives, distinguishing them from models like data trusts, data commons, and data unions, and defines them based on member ownership, democratic governance, and data sovereignty. It explores applications in sectors like healthcare, agriculture, and construction. Despite their potential, data cooperatives face challenges in coordination, scalability, and member engagement, requiring innovative governance strategies, robust technical systems, and mechanisms to align member interests with cooperative goals. The paper concludes by advocating for data cooperatives as a sustainable, democratic, and ethical model for the future data economy…(More)”.

The Future of Health Is Preventive — If We Get Data Governance Right


Article by Stefaan Verhulst: “After a long gestation period of three years, the European Health Data Space (EHDS) is now coming into effect across the European Union, potentially ushering in a new era of health data access, interoperability, and innovation. As this ambitious initiative enters the implementation phase, it brings with it the opportunity to fundamentally reshape how health systems across Europe operate. More generally, the EHDS contains important lessons (and some cautions) for the rest of the world, suggesting how a fragmented, reactive model of healthcare may transition to one that is more integrated, proactive, and prevention-oriented.

For too long, health systems–in the EU and around the world–have been built around treating diseases rather than preventing them. Now, we have an opportunity to change that paradigm. Data, and especially the advent of AI, give us the tools to predict and intervene before illness takes hold. Data offers the potential for a system that prioritizes prevention–one where individuals receive personalized guidance to stay healthy, policymakers access real-time evidence to address risks before they escalate, and epidemics are predicted weeks in advance, enabling proactive, rapid, and highly effective responses.

But to make AI-powered preventive health care a reality, and to make the EHDS a success, we need a new data governance approach, one that would include two key components:

  • The ability to reuse data collected for other purposes (e.g., mobility, retail sales, workplace trends) to improve health outcomes.
  • The ability to integrate different data sources–clinical records and electronic health records (EHRS), but also environmental, social, and economic data — to build a complete picture of health risks.

In what follows, we outline some critical aspects of this new governance framework, including responsible data access and reuse (so-called secondary use), moving beyond traditional consent models to a social license for reuse, data stewardship, and the need to prioritize high-impact applications. We conclude with some specific recommendations for the EHDS, built from the preceding general discussion about the role of AI and data in preventive health…(More)”.