DATA – The Living Library

Comparative Data Law

Curated on July 5, 2025July 5, 2025 by Stefaan Verhulst

Conference Proceedings edited by Josef Drexl, Moritz Hennemann, Patricia Boshe, and Klaus Wiedemann: “The increasing relevance of data is now recognized all over the world. The large number of regulatory acts and proposals in the field of data law serves as a testament to the significance of data processing for the economies of the world. The European Union’s Data Strategy, the African Union’s Data Policy Framework and the Australian Data Strategy only serve as examples within a plethora of regulatory actions. Yet, the purposeful and sensible use of data does not only play a role in economic terms, e.g. regarding the welfare or competitiveness of economies. The implications for society and the common good are at least equally relevant. For instance, data processing is an integral part of modern research methodology and can thus help to address the problems the world is facing today, such as climate change.

The conference was the third and final event of the Global Data Law Conference Series. Legal scholars from all over the world met, presented and exchanged their experiences on different data-related regulatory approaches. Various instruments and approaches to the regulation of data – personal or non-personal – were discussed, without losing sight of the global effects going hand-in-hand with different kinds of regulation.

In compiling the conference proceedings, this book does not only aim at providing a critical and analytical assessment of the status quo of data law in different countries today, it also aims at providing a forward-looking perspective on the pressing issues of our time, such as: How to promote sensible data sharing and purposeful data governance? Under which circumstances, if ever, do data localisation requirements make sense? How – and by whom – should international regulation be put in place? The proceedings engage in a discussion on future-oriented ideas and actions, thereby promoting a constructive and sensible approach to data law around the world…(More)”.

China is building an entire empire on data

Curated on July 5, 2025July 5, 2025 by Stefaan Verhulst

The Economist: “CHINA’S 1.1BN internet users churn out more data than anyone else on Earth. So does the country’s vast network of facial-recognition cameras. As autonomous cars speed down roads and flying ones criss-cross the skies, the quality and value of the information flowing from emerging technologies will soar. Yet the volume of data is not the only thing setting China apart. The government is also embedding data management into the economy and national security. That has implications for China, and holds lessons for democracies.

China’s planners see data as a factor of production, alongside labour, capital and land. Xi Jinping, the president, has called data a foundational resource “with a revolutionary impact” on international competition. The scope of this vision is unparalleled, affecting everything from civil liberties to the profits of internet firms and China’s pursuit of the lead in artificial intelligence.

Mr Xi’s vision is being enacted fast. In 2021 China released rules modelled on Europe’s General Data Protection Regulation (GDPR). Now it is diverging quickly from Western norms. All levels of government are to marshal the data resources they have. A sweeping project to assess the data piles at state-owned firms is under way. The idea is to value them as assets, and add them to balance-sheets or trade them on state-run exchanges. On June 3rd the State Council released new rules to compel all levels of government to share data.

Another big step is a digital ID, due to be launched on July 15th. Under this, the central authorities could control a ledger of every person’s websites and apps. Connecting someone’s name with their online activity will become harder for the big tech firms which used to run the system. They will see only an anonymised stream of digits and letters. Chillingly, however, the ledger may one day act as a panopticon for the state.

China’s ultimate goal appears to be to create an integrated national data ocean, covering not just consumers but industrial and state activity, too. The advantages are obvious, and include economies of scale for training AI models and lower barriers to entry for small new firms…(More)”.

AI companies start winning the copyright fight

Curated on July 5, 2025July 5, 2025 by Stefaan Verhulst

Article by Blake Montgomery: “…tech companies notched several victories in the fight over their use of copyrighted text to create artificial intelligence products.

Anthropic: A US judge has ruled that Anthropic, maker of the Claude chatbot, use of books to train its artificial intelligence system – without permission of the authors – did not breach copyright law. Judge William Alsup compared the Anthropic model’s use of books to a “reader aspiring to be a writer.”

And the next day, Meta: The US district judge Vince Chhabria, in San Francisco, said in his decision on the Meta case that the authors had not presented enough evidence that the technology company’s AI would cause “market dilution” by flooding the market with work similar to theirs.

The same day that Meta received its favorable ruling, a group of writers sued Microsoft, alleging copyright infringement in the creation of that company’s Megatron text generator. Judging by the rulings in favor of Meta and Anthropic, the authors are facing an uphill battle.

These three cases are skirmishes in the wider legal war over copyrighted media, which rages on. Three weeks ago, Disney and NBCUniversal sued Midjourney, alleging that the company’s namesake AI image generator and forthcoming video generator made illegal use of the studios’ iconic characters like Darth Vader and the Simpson family. The world’s biggest record labels – Sony, Universal and Warner – have sued two companies that make AI-powered music generators, Suno and Udio. On the textual front, the New York Times’ suit against OpenAI and Microsoft is ongoing.

The lawsuits over AI-generated text were filed first, and, as their rulings emerge, the next question in the copyright fight is whether decisions about one type of media will apply to the next.

“The specific media involved in the lawsuit – written works versus images versus videos versus audio – will certainly change the fair-use analysis in each case,” said John Strand, a trademark and copyright attorney with the law firm Wolf Greenfield. “The impact on the market for the copyrighted works is becoming a key factor in the fair-use analysis, and the market for books is different than that for movies.”…(More)”.

Bad data leads to bad policy

Curated on July 5, 2025July 5, 2025 by Stefaan Verhulst

Article by Georges-Simon Ulrich: “When the UN created a Statistical Commission in 1946, the world was still recovering from the devastation of the second world war. Then, there was broad consensus that only reliable, internationally comparable data could prevent conflict, combat poverty and anchor global co-operation. Nearly 80 years later, this insight remains just as relevant, but the context has changed dramatically…

This erosion of institutional capacity could not come at a more critical moment. The UN is unable to respond adequately as it is facing a staffing shortfall itself. Due to ongoing austerity measures at the UN, many senior positions remain vacant, and the director of the UN Statistics Division has retired, with no successor appointed. This comes at a time when bold and innovative initiatives — such as a newly envisioned Trusted Data Observatory — are urgently needed to make official statistics more accessible and machine-readable.

Meanwhile, the threat of targeted disinformation is growing. On social media, distorted or manipulated content spreads at unprecedented speed. Emerging tools like AI chatbots exacerbate the problem. These systems rely on web content, not verified data, and are not built to separate truth from falsehood. Making matters worse, many governments cannot currently make their data usable for AI because it is not standardised, not machine-readable, or not openly accessible. The space for sober, evidence-based discourse is shrinking.

This trend undermines public trust in institutions, strips policymaking of its legitimacy, and jeopardises the UN Sustainable Development Goals (SDGs). Without reliable data, governments will be flying blind — or worse: they will be deliberately misled.

When countries lose control of their own data, or cannot integrate it into global decision-making processes, they become bystanders to their own development. Decisions about their economies, societies and environments are then outsourced to AI systems trained on skewed, unrepresentative data. The global south is particularly at risk, with many countries lacking access to quality data infrastructures. In countries such as Ethiopia, unverified information spreading rapidly on social media has fuelled misinformation-driven violence.

The Covid-19 pandemic demonstrated that strong data systems enable better crisis response. To counter these risks, the creation of a global Trusted Data Observatory (TDO) is essential. This UN co-ordinated, democratically governed platform would help catalogue and make accessible trusted data around the world — while fully respecting national sovereignty…(More)”

Money, Power and AI

Curated on July 5, 2025July 5, 2025 by Stefaan Verhulst

Open Access Book edited by Zofia Bednarz and Monika Zalnieriute: “… bring together leading experts to shed light on how artificial intelligence (AI) and automated decision-making (ADM) create new sources of profits and power for financial firms and governments. Chapter authors—which include public and private lawyers, social scientists, and public officials working on various aspects of AI and automation across jurisdictions—identify mechanisms, motivations, and actors behind technology used by Automated Banks and Automated States, and argue for new rules, frameworks, and approaches to prevent harms that result from the increasingly common deployment of AI and ADM tools. Responding to the opacity of financial firms and governments enabled by AI, Money, Power and AI advances the debate on scrutiny of power and accountability of actors who use this technology…(More)”.

A New Social Contract for AI? Comparing CC Signals and the Social License for Data Reuse

Curated on July 3, 2025July 3, 2025 by Stefaan Verhulst

Article by Stefaan Verhulst: “Last week, Creative Commons — the global nonprofit best known for its open copyright licenses — released “CC Signals: A New Social Contract for the Age of AI.” This framework seeks to offer creators a means to signal their preferences for how their works are used in machine learning, including training Artificial Intelligence systems. It marks an important step toward integrating re-use preferences and shared benefits directly into the AI development lifecycle….

From a responsible AI perspective, the CC Signals framework is an important development. It demonstrates how soft governance mechanisms — declarations, usage expressions, and social signaling — can supplement or even fill gaps left by inconsistent global copyright regimes in the context of AI. At the same time, this initiative provides an interesting point of comparison with our ongoing work to develop a Social License for Data Reuse. A social license for data reuse is a participatory governance framework that allows communities to collectively define, signal and enforce the conditions under which data about them can be reused — including training AI. Unlike traditional consent-based mechanisms, which focus on individual permissions at the point of collection, a social license introduces a community-centered, continuous process of engagement — ensuring that data practices align with shared values, ethical norms, and contextual realities. It provides a complementary layer to legal compliance, emphasizing trust, legitimacy, and accountability in data governance.

While both frameworks are designed to signal preferences and expectations for data or content reuse, they differ meaningfully in scope, method, and theory of change.

Below, we offer a comparative analysis of the two frameworks — highlighting how each approaches the challenge of embedding legitimacy and trust into AI and data ecosystems…(More)”.

Unpacking B2G data sharing mechanism under the EU data act

Curated on July 3, 2025July 3, 2025 by Stefaan Verhulst

Paper by Ludovica Paseri and Stefaan G. Verhulst: “The paper proposes an analysis of the business-to-government (B2G) data sharing mechanism envisaged by the Regulation EU 2023/2854, the so-called Data Act. The Regulation, in force since 11 January 2024, will be applicable from 12 September 2025, requiring the actors involved to put in place a compliance process. The focus of the paper is to present an assessment of the mechanism foreseen by the EU legislators, with the intention of highlighting two bottlenecks, represented by: (i) the flexibility of the definition of “exceptional need”, “public emergency” and “public interest”; (ii) the cumbersome procedure for data holders. The paper discusses the role that could be played by in-house data stewardship structures as a particularly beneficial contact point for complying with B2G data sharing requirements…(More)“.

Commission facilitates data access for researchers under the Digital Services Act

Curated on July 2, 2025July 3, 2025 by Stefaan Verhulst

Press Release: “On 2 July 2025, the Commission published a delegated act outlining rules granting access to data for qualified researchers under the Digital Services Act (DSA). This delegated act enables access to the internal data of very large online platforms (VLOPs) and search engines (VLOSEs) to research the systemic risks and on the mitigation measures in the European Union.

The delegated act on data access clarifies the procedures for VLOPs and VLOSEs to share data with vetted researchers, including data formats and requirements for data documentation. Moreover, the delegated act sets out which information Digital Services Coordinators (DSCs), VLOPs and VLOSEs must make public to facilitate vetted researchers’ applications to access relevant datasets.

With the adoption of the delegated act, the Commission will launch the DSA data access portal where researchers interested in accessing data under the new mechanism can find information and exchange with VLOPs, VLOSEs and DSCs on their data access applications.

Before accessing internal data, researchers must be vetted by a DSC.

For this vetting process, researchers must submit a data access application demonstrating their affiliation to a research organisation, their independence from commercial interests, and their ability to manage the requested data in line with security, confidentiality and privacy rules. In addition, researchers need to disclose the funding of the research project for which the data is requested and commit to publishing the results of their research. Only data that is necessary to perform research on systemic risks in the EU can be requested.

To complement the rules in the delegated act, on 27 June 2025 the Board of Digital Services endorsed a proposal for further cooperation among DSCs in the vetting process of researchers…(More)”.

Cloudflare Introduces Default Blocking of A.I. Data Scrapers

Curated on July 2, 2025July 2, 2025 by Stefaan Verhulst

Article by Natallie Rocha: “Data for A.I. systems has become an increasingly contentious issue. OpenAI, Anthropic, Google and other companies building A.I. systems have amassed reams of information from across the internet to train their A.I. models. High-quality data is particularly prized because it helps A.I. models become more proficient in generating accurate answers, videos and images.

But website publishers, authors, news organizations and other content creators have accused A.I. companies of using their material without permission and payment. Last month, Reddit sued Anthropic, saying the start-up had unlawfully used the data of its more than 100 million daily users to train its A.I. systems. In 2023, The New York Times sued OpenAI and its partner, Microsoft, accusing them of copyright infringement of news content related to A.I. systems. OpenAI and Microsoft have denied those claims.

Some publishers have struck licensing deals with A.I. companies to receive compensation for their content. In May, The Times agreed to license its editorial content to Amazon for use in the tech giant’s A.I. platforms. Axel Springer, Condé Nast and News Corp have also entered into agreements with A.I. companies to receive revenue for the use of their material.

Mark Howard, the chief operating officer of Time, said he welcomed Cloudflare’s move. Data scraping by A.I. companies threatens anyone who creates content, he said, adding that news publishers like Time deserved fair compensation for what they published…(More)”.

AI and Assembly: Coming Together and Apart in a Datafied World

Curated on July 2, 2025July 2, 2025 by Stefaan Verhulst

Book edited by Toussaint Nothias and Lucy Bernholz: “Artificial intelligence has moved from the lab into everyday life and is now seemingly everywhere. As AI creeps into every aspect of our lives, the data grab required to power AI also expands. People worldwide are tracked, analyzed, and influenced, whether on or off their screens, inside their homes or outside in public, still or in transit, alone or together. What does this mean for our ability to assemble with others for collective action, including protesting, holding community meetings and organizing rallies ? In this context, where and how does assembly take place, and who participates by choice and who by coercion? AI and Assembly explores these questions and offers global perspectives on the present and future of assembly in a world taken over by AI.

The contributors analyze how AI threatens free assembly by clustering people without consent, amplifying social biases, and empowering authoritarian surveillance. But they also explore new forms of associational life that emerge in response to these harms, from communities in the US conducting algorithmic audits to human rights activists in East Africa calling for biometric data protection and rideshare drivers in London advocating for fair pay. Ultimately, AI and Assembly is a rallying cry for those committed to a digital future beyond the narrow horizon of corporate extraction and state surveillance…(More)”.