DATA – The Living Library

Comparative Data Law

Curated on July 12, 2025July 10, 2025 by Stefaan Verhulst

Conference Proceedings edited by Josef Drexl, Moritz Hennemann, Patricia Boshe, and Klaus Wiedemann: “The increasing relevance of data is now recognized all over the world. The large number of regulatory acts and proposals in the field of data law serves as a testament to the significance of data processing for the economies of the world. The European Union’s Data Strategy, the African Union’s Data Policy Framework and the Australian Data Strategy only serve as examples within a plethora of regulatory actions. Yet, the purposeful and sensible use of data does not only play a role in economic terms, e.g. regarding the welfare or competitiveness of economies. The implications for society and the common good are at least equally relevant. For instance, data processing is an integral part of modern research methodology and can thus help to address the problems the world is facing today, such as climate change.

The conference was the third and final event of the Global Data Law Conference Series. Legal scholars from all over the world met, presented and exchanged their experiences on different data-related regulatory approaches. Various instruments and approaches to the regulation of data – personal or non-personal – were discussed, without losing sight of the global effects going hand-in-hand with different kinds of regulation.

In compiling the conference proceedings, this book does not only aim at providing a critical and analytical assessment of the status quo of data law in different countries today, it also aims at providing a forward-looking perspective on the pressing issues of our time, such as: How to promote sensible data sharing and purposeful data governance? Under which circumstances, if ever, do data localisation requirements make sense? How – and by whom – should international regulation be put in place? The proceedings engage in a discussion on future-oriented ideas and actions, thereby promoting a constructive and sensible approach to data law around the world…(More)”.

Data Governance Toolkit: Navigating Data in the Digital Age

Curated on July 9, 2025July 9, 2025 by Stefaan Verhulst

Toolkit by the Broadband Commission Working Group on Data Governance: “.. the Toolkit serves as a practical, capacity-building resource for policymakers, regulators, and governments. It offers actionable guidance on key data governance priorities — including legal frameworks, institutional roles, cross-border data flows, digital self-determination, and data for AI.

As a key capacity building resource, the Toolkit aims to empower policymakers, regulators and data practitioners to navigate the complexities of data governance in the digital era. Plans are currently underway to translate the Toolkit into French, Spanish, Chinese, and Arabic to ensure broader global accessibility and impact. Pilot implementation at country level is also being explored for Q4 2025 to support national-level uptake.

The Data Governance Toolkit

The Data Governance Toolkit: Navigating Data in the Digital Age offers a practical, rights-based guide to help governments, institutions, and stakeholders make data work for all.

The Toolkit is organized around four foundational data governance components—referred to as the 4Ps of Data Governance:

Why (Purpose): How to define a vision and purpose for data governance in the context of AI, digital transformation, and sustainable development.
How (Principles): What principles should guide a governance framework to balance innovation, security, and ethical considerations.

Who (People and Processes): Identifying the stakeholders, institutions, and processes required to build and enforce responsible governance structures.
What (Practices and Mechanisms): Policies and best practices to manage data across its entire lifecycle while ensuring privacy, interoperability, and regulatory compliance.

The Toolkit also includes:

A self-assessment framework to help organizations evaluate their current capabilities;
A glossary of key terms to foster shared understanding;
A curated list of other toolkits and frameworks for deeper engagement.

Designed to be adaptable across regions and sectors, the Data Governance Toolkit is not a one-size-fits-all manual—but a modular resource to guide smarter, safer, and fairer data use in the digital age…(More)”

Researchers’ access to information from regulated online services

Curated on July 8, 2025July 9, 2025 by Stefaan Verhulst

Report by Ofcom (UK): “…We outline three potential policy options and models for facilitating greater researcher access, which include:

Clarify existing legal rules: Relevant authorities, could provide additional guidance on what is already legally permitted for researcher access on important issues, such as data donations and research-related scraping.
Create new duties, enforced by a backstop regulator: Services could be required to put in place systems and processes to operationalise data access. This could include new duties on regulated services to create standard procedures for researcher accreditation. Services would be responsible for providing researchers with data directly or providing the interface through which they can access it and offering appeal and redress mechanisms. A backstop regulator could enforce these duties – either an existing or new body.
Enable and manage access via independent intermediary: New legal powers could be granted to a trusted third party which would facilitate and manage researchers’ access to data. This intermediary – which could again be an existing or new body – would accredit researchers and provide secure access.

Our report describes three types of intermediary that could be considered – direct access intermediary, notice to service intermediary and repository intermediary models.

Direct access intermediary. Researchers could request data with an intermediary facilitating secure access. In this model, services could retain responsibility for hosting and providing data while the intermediary maintains the interface by which researchers request access.
Notice to service intermediary. Researchers could apply for accreditation and request access to specific datasets via the intermediary. This could include data that would not be accessible in direct access models. The intermediary would review and refuse or approve access. Services would then be required to provide access to the approved data.
Repository intermediary. The intermediary could itself provide direct access to data, by providing an interface for data access and/or hosting the data itself and taking responsibility for data governance. This could also include data that would not be accessible in direct access models…(More)”.

Sudden loss of key US satellite data could send hurricane forecasting back ‘decades’

Curated on July 8, 2025July 8, 2025 by Stefaan Verhulst

Article by Eric Holthaus: “A critical US atmospheric data collection program will be halted by Monday, giving weather forecasters just days to prepare, according to a public notice sent this week. Scientists that the Guardian spoke with say the change could set hurricane forecasting back “decades”, just as this year’s season ramps up.

In a National Oceanic and Atmospheric Administration (Noaa) message sent on Wednesday to its scientists, the agency said that “due to recent service changes” the Defense Meteorological Satellite Program (DMSP) will “discontinue ingest, processing and distribution of all DMSP data no later than June 30, 2025”.

Due to their unique characteristics and ability to map the entire world twice a day with extremely high resolution, the three DMSP satellites are a primary source of information for scientists to monitor Arctic sea ice and hurricane development. The DMSP partners with Noaa to make weather data collected from the satellites publicly available.

The reasons for the changes, and which agency was driving them, were not immediately clear. Noaa said they would not affect the quality of forecasting.

However, the Guardian spoke with several scientists inside and outside of the US government whose work depends on the DMSP, and all said there are no other US programs that can form an adequate replacement for its data.

“We’re a bit blind now,” said Allison Wing, a hurricane researcher at Florida State University. Wing said the DMSP satellites are the only ones that let scientists see inside the clouds of developing hurricanes, giving them a critical edge in forecasting that now may be jeopardized.

“Before these types of satellites were present, there would often be situations where you’d wake up in the morning and have a big surprise about what the hurricane looked like,” said Wing. “Given increases in hurricane intensity and increasing prevalence towards rapid intensification in recent years, it’s not a good time to have less information.”..(More)”.

The Smart City as a Field of Innovation: Effects of Public-Private Data Collaboration on the Innovation Performance of Small and Medium-Sized Enterprises in China

Curated on July 8, 2025July 8, 2025 by Stefaan Verhulst

Paper by xiaohui jiang and Masaru Yarime: “The Chinese government has been playing an important role in stimulating innovation among Chinese enterprises. Small and medium-sized enterprises (SMEs), with their limited internal resources, particularly face a severe challenge in implementing innovation activities that depend upon data, funding sources, and talents. However, the rapidly developing smart city projects in China, where significant amounts of data are available from various sophisticated devices and generous funding opportunities, are providing rich opportunities for SMEs to explore data-driven innovation. Chinese Governments are trying to actively engage SMEs in the process of smart city construction. When cooperating with the government, the availability of and access to data involved in the government contracts and the ability required in the project help SMEs to train and improve their innovation ability.In this article, we intend to address how obtaining different types of government contracts (equipment supply, platform building, data analysis) can influence firms’ performance on innovation. Obtaining different types of government contracts are regarded as receiving different types of treatments. The hypothesis is that the data analysis type of contracts has a larger positive influence on improving the innovation ability compared to the platform building type, while the platform building type of contracts can have a larger influence compared to equipment supply. Focusing on the case of SMEs in China, this research aims to shed light on how the government and enterprises collaborate in smart city projects to facilitate innovation. Data on companies’ registered capital, industry, and software products from 1990– 2020 is compiled from the Tianyancha website. A panel dataset is established with the key characteristics of the SMEs, software productions, and their record on government contracts. Based on the company’s basic characteristics, we divided six pairs of treatment and control groups using propensity score matching (PSM) and then ran a validity test to confirm that the result of the division was reliable. Then based on the established control and treatment pairs, we run a difference-in-difference (DID) model, and the result supports our original hypothesis. The statistics shows mixed result, Hypothesis 1 which indicates that companies obtaining data analysis contracts will experience greater innovation improvements compared to those with platform-building contracts, is partially confirmed when using software copyright as an outcome variable. However, when using patent data as an indicator, the statistics is insignificant. Hypothesis 2, which posits that companies with platform-building contracts will show greater innovation improvements than those with equipment supply contracts, is not supported. Hypothesis 3 which suggests that companies receiving government contracts will have higher innovation outputs than those without, is confirmed. The case studies later have revealed the complex mechanisms behind the scenario…(More)”.

Unpacking OpenAI’s Amazonian Archaeology Initiative

Curated on July 8, 2025July 8, 2025 by Stefaan Verhulst

Article by Lori Regattieri: “What if I told you that one of the most well-capitalized AI companies on the planet is asking volunteers to help them uncover “lost cities” in the Amazonia—by feeding machine learning models with open satellite data, lidar, “colonial” text and map records, and indigenous oral histories? This is the premise of the OpenAI to Z Challenge, a Kaggle-hosted hackathon framed as a platform to “push the limits” of AI through global knowledge cooperation. In practice, this is a product development experiment cloaked as public participation. The contributions of users, the mapping of biocultural data, and the modeling of ancestral landscapes all feed into the refinement of OpenAI’s proprietary systems. The task itself may appear novel. The logic is not. This is the familiar playbook of Big Tech firms—capture public knowledge, reframe it as open input, and channel it into infrastructure that serves commercial, rather than communal goals.

The “challenge” is marketed as a “digital archaeology” experiment, it invites participants from all around the world to search for “hidden” archaeological sites in the Amazonia biome (Brazil, Bolivia, Columbia, Ecuador, Guyana, Peru, Suriname, Venezuela, and French Guiana) using a curated stack of open-source data. The competition requires participants to use OpenAI’s latest GPT-4.1 and the o3/o4-mini models to parse multispectral satellite imagery, LiDAR-derived elevation maps (Light Detection and Ranging is a remote sensing technology that uses laser pulses to generate high-resolution 3D models of terrain, including areas covered by dense vegetation), historical maps, and digitized ethnographic archives. The coding teams or individuals need to geolocate “potential” archaeological sites, argue their significance using verifiable public sources, and present reproducible methodologies. Prize incentives total $400,000 USD, with a first-place award of $250,000 split between cash and OpenAI API credits.

While framed as a novel invitation to “anyone” to do archaeological research, the competition focuses mainly on the Brazilian territory, transforming the Amazonia and its peoples into an open laboratory for model testing. What is presented as scientific crowdsourcing is in fact a carefully designed mechanism for refining geospatial AI at scale. Participants supply not just labor and insight, but novel training and evaluation strategies that extend far beyond heritage science and into the commercial logics of spatial computing…(More)”.

Will AI speed up literature reviews or derail them entirely?

Curated on July 8, 2025July 8, 2025 by Stefaan Verhulst

Article by Sam A. Reynolds: “Over the past few decades, evidence synthesis has greatly increased the effectiveness of medicine and other fields. The process of systematically combining findings from multiple studies into comprehensive reviews helps researchers and policymakers to draw insights from the global literature¹. AI promises to speed up parts of the process, including searching and filtering. It could also help researchers to detect problematic papers². But in our view, other potential uses of AI mean that many of the approaches being developed won’t be sufficient to ensure that evidence syntheses remain reliable and responsive. In fact, we are concerned that the deployment of AI to generate fake papers presents an existential crisis for the field.

What’s needed is a radically different approach — one that can respond to the updating and retracting of papers over time.

We propose a network of continually updated evidence databases, hosted by diverse institutions as ‘living’ collections. AI could be used to help build the databases. And each database would hold findings relevant to a broad theme or subject, providing a resource for an unlimited number of ultra-rapid and robust individual reviews…

Currently, the gold standard for evidence synthesis is the systematic review. These are comprehensive, rigorous, transparent and objective, and aim to include as much relevant high-quality evidence as possible. They also use the best methods available for reducing bias. In part, this is achieved by getting multiple reviewers to screen the studies; declaring whatever criteria, databases, search terms and so on are used; and detailing any conflicts of interest or potential cognitive biases…(More)”.

Red Teaming Artificial Intelligence for Social Good

Curated on July 6, 2025July 6, 2025 by Stefaan Verhulst

UNESCO Report: “Generative Artificial Intelligence (Gen AI) has become an integral part of our digital landscape and daily life. Understanding its risks and participating in solutions is crucial to ensuring that it works for the overall social good. This PLAYBOOK introduces Red Teaming as an accessible tool for testing and evaluating AI systems for social good, exposing stereotypes, bias and potential harms. As a way of illustrating harms, practical examples of Red Teaming for social good are provided, building on the collaborative work carried out by UNESCO and Humane Intelligence. The results demonstrate forms of technology-facilitated gender-based violence (TFGBV) enabled by Gen AI and provide practical actions and recommendations on how to address these growing concerns.

Red Teaming — the practice of intentionally testing Gen AI models to expose vulnerabilities — has traditionally been used by major tech companies and AI labs. One tech company surveyed 1,000 machine learning engineers and found that 89% reported vulnerabilities (Aporia, 2024). This PLAYBOOK provides access to these critical testing methods, enabling organizations and communities to actively participate. Through the structured exercises and real-world scenarios provided, participants can systematically evaluate how Gen AI models may perpetuate, either intentionally or unintentionally, stereotypes or enable gender-based violence.By providing organizations with this easy-to-use tool to conduct their own Red Teaming exercises, participants can select their own thematic area of concern, enabling evidence-based advocacy for more equitable AI for social good…(More)”.

Harnessing Wearable Data and Social Sentiment: Designing Proactive Consumer and Patient EngagementStrategies through Integrated AI Systems

Curated on July 6, 2025July 3, 2025 by Stefaan Verhulst

Paper by Warren Liang et al: “In the age of ubiquitous computing, the convergence of wearable technologies and social sentiment analysis has opened new frontiers in both consumer engagement and patient care. These technologies generate continuous, high-frequency, multimodal data streams that are increasingly being leveraged by artificial intelligence (AI) systems for predictive analytics and adaptive interventions. This article explores a unified, integrated framework that combines physiological data from wearables and behavioral insights from social media sentiment to drive proactive engagement strategies. By embedding AI-driven systems into these intersecting data domains, healthcare organizations, consumer brands, and public institutions can offer hyper-personalized experiences, predictive health alerts, emotional wellness interventions, and behaviorally aligned communication.

This paper critically evaluates how machine learning models, natural language processing, and real-time stream analytics can synthesize structured and unstructured data for longitudinal engagement, while also exploring the ethical, privacy, and infrastructural implications of such integration. Through cross-sectoral analysis across healthcare, retail, and public health, we illustrate scalable architectures and case studies where real-world deployment of such systems has yielded measurable improvements in satisfaction, retention, and health outcomes. Ultimately, the synthesis of wearable telemetry and social context data through AI systems represents a new paradigm in engagement science — moving from passive data collection to anticipatory, context-aware engagement ecosystems…(More)”.

China is building an entire empire on data

Curated on July 5, 2025July 5, 2025 by Stefaan Verhulst

The Economist: “CHINA’S 1.1BN internet users churn out more data than anyone else on Earth. So does the country’s vast network of facial-recognition cameras. As autonomous cars speed down roads and flying ones criss-cross the skies, the quality and value of the information flowing from emerging technologies will soar. Yet the volume of data is not the only thing setting China apart. The government is also embedding data management into the economy and national security. That has implications for China, and holds lessons for democracies.

China’s planners see data as a factor of production, alongside labour, capital and land. Xi Jinping, the president, has called data a foundational resource “with a revolutionary impact” on international competition. The scope of this vision is unparalleled, affecting everything from civil liberties to the profits of internet firms and China’s pursuit of the lead in artificial intelligence.

Mr Xi’s vision is being enacted fast. In 2021 China released rules modelled on Europe’s General Data Protection Regulation (GDPR). Now it is diverging quickly from Western norms. All levels of government are to marshal the data resources they have. A sweeping project to assess the data piles at state-owned firms is under way. The idea is to value them as assets, and add them to balance-sheets or trade them on state-run exchanges. On June 3rd the State Council released new rules to compel all levels of government to share data.

Another big step is a digital ID, due to be launched on July 15th. Under this, the central authorities could control a ledger of every person’s websites and apps. Connecting someone’s name with their online activity will become harder for the big tech firms which used to run the system. They will see only an anonymised stream of digits and letters. Chillingly, however, the ledger may one day act as a panopticon for the state.

China’s ultimate goal appears to be to create an integrated national data ocean, covering not just consumers but industrial and state activity, too. The advantages are obvious, and include economies of scale for training AI models and lower barriers to entry for small new firms…(More)”.