Comparative Data Law


Conference Proceedings edited by Josef Drexl, Moritz Hennemann, Patricia Boshe,  and Klaus Wiedemann: “The increasing relevance of data is now recognized all over the world. The large number of regulatory acts and proposals in the field of data law serves as a testament to the significance of data processing for the economies of the world. The European Union’s Data Strategy, the African Union’s Data Policy Framework and the Australian Data Strategy only serve as examples within a plethora of regulatory actions. Yet, the purposeful and sensible use of data does not only play a role in economic terms, e.g. regarding the welfare or competitiveness of economies. The implications for society and the common good are at least equally relevant. For instance, data processing is an integral part of modern research methodology and can thus help to address the problems the world is facing today, such as climate change.

The conference was the third and final event of the Global Data Law Conference Series. Legal scholars from all over the world met, presented and exchanged their experiences on different data-related regulatory approaches. Various instruments and approaches to the regulation of data – personal or non-personal – were discussed, without losing sight of the global effects going hand-in-hand with different kinds of regulation.

In compiling the conference proceedings, this book does not only aim at providing a critical and analytical assessment of the status quo of data law in different countries today, it also aims at providing a forward-looking perspective on the pressing issues of our time, such as: How to promote sensible data sharing and purposeful data governance? Under which circumstances, if ever, do data localisation requirements make sense? How – and by whom – should international regulation be put in place? The proceedings engage in a discussion on future-oriented ideas and actions, thereby promoting a constructive and sensible approach to data law around the world…(More)”.

AI companies start winning the copyright fight


Article by Blake Montgomery: “…tech companies notched several victories in the fight over their use of copyrighted text to create artificial intelligence products.

Anthropic: A US judge has ruled that Anthropic, maker of the Claude chatbot, use of books to train its artificial intelligence system – without permission of the authors – did not breach copyright law. Judge William Alsup compared the Anthropic model’s use of books to a “reader aspiring to be a writer.”

And the next day, Meta: The US district judge Vince Chhabria, in San Francisco, said in his decision on the Meta case that the authors had not presented enough evidence that the technology company’s AI would cause “market dilution” by flooding the market with work similar to theirs.

The same day that Meta received its favorable ruling, a group of writers sued Microsoft, alleging copyright infringement in the creation of that company’s Megatron text generator. Judging by the rulings in favor of Meta and Anthropic, the authors are facing an uphill battle.

These three cases are skirmishes in the wider legal war over copyrighted media, which rages on. Three weeks ago, Disney and NBCUniversal sued Midjourney, alleging that the company’s namesake AI image generator and forthcoming video generator made illegal use of the studios’ iconic characters like Darth Vader and the Simpson family. The world’s biggest record labels – Sony, Universal and Warner – have sued two companies that make AI-powered music generators, Suno and Udio. On the textual front, the New York Times’ suit against OpenAI and Microsoft is ongoing.

The lawsuits over AI-generated text were filed first, and, as their rulings emerge, the next question in the copyright fight is whether decisions about one type of media will apply to the next.

“The specific media involved in the lawsuit – written works versus images versus videos versus audio – will certainly change the fair-use analysis in each case,” said John Strand, a trademark and copyright attorney with the law firm Wolf Greenfield. “The impact on the market for the copyrighted works is becoming a key factor in the fair-use analysis, and the market for books is different than that for movies.”…(More)”.

Computer Science and the Law


Article by Steven M. Bellovin: “There were three U.S. technical/legal developments occurring in approximately 1993 that had a profound effect on the technology industry and on many technologists. More such developments are occurring with increasing frequency.

The three developments were, in fact, technically unrelated. One was a bill before the U.S. Congress for a standardized wiretap interface in phone switches, a concept that spread around the world under the generic name of “lawful intercept.” The second was an update to the copyright statute to adapt to the digital age. While there were some useful changes—caching proxies and ISPs transmitting copyrighted material were no longer to be held liable for making illegal copies of protected content—it also provided an easy way for careless or unscrupulous actors—including bots—to request takedown of perfectly legal material. The third was the infamous Clipper chip, an encryption device that provided a backdoor for the U.S.—and only the U.S.—government.

All three of these developments could be and were debated on purely legal or policy grounds. But there were also technical issues. Thus, one could argue on legal grounds that the Clipper chip granted the government unprecedented powers, powers arguably in violation of the Fourth Amendment to the U.S. Constitution. That, of course, is a U.S. issue—but technologists, including me, pointed out the technical risks of deploying a complex cryptographic protocol, anywhere in the world (and many other countries have since expressed similar desires). Sure enough, Matt Blaze showed how to abuse the Clipper chip to let it do backdoor-free encryption, and at least two other mechanisms for adding backdoors to encryption protocols were shown to have flaws that allowed malefactors to read data that others had encrypted.

These posed a problem: debating some issues intelligently required not just a knowledge of law or of technology, but of both. That is, some problems cannot be discussed purely on technical grounds or purely on legal grounds; the crux of the matter lies in the intersection.

Consider, for example, the difference between content and metadata in a communication. Metadata alone is extremely powerful; indeed, Michael Hayden, former director of both the CIA and the NSA, once said, “We kill people based on metadata.” The combination of content and metadata is of course even more powerful. However, under U.S. law (and the legal reasoning is complex and controversial), the content of a phone call is much more strongly protected than the metadata: who called whom, when, and for how long they spoke. But how does this doctrine apply to the Internet, a network that provides far more powerful abilities to the endpoints in a conversation? (Metadata analysis is not an Internet-specific phenomenon. The militaries of the world have likely been using it for more than a century.) You cannot begin to answer that question without knowing not just how the Internet actually works, but also the legal reasoning behind the difference. It took more than 100 pages for some colleagues and I, three computer scientists and a former Federal prosecutor, to show how the line between content and metadata can be drawn in some cases (and that the Department of Justice’s manuals and some Federal judges got the line wrong), but that in other cases, there is no possible line1 

Newer technologies pose the same sorts of risks…(More)”.

Updating purpose limitation for AI: a normative approach from law and philosophy 


Paper by Rainer Mühlhoff and Hannah Ruschemeier: “The purpose limitation principle goes beyond the protection of the individual data subjects: it aims to ensure transparency, fairness and its exception for privileged purposes. However, in the current reality of powerful AI models, purpose limitation is often impossible to enforce and is thus structurally undermined. This paper addresses a critical regulatory gap in EU digital legislation: the risk of secondary use of trained models and anonymised training datasets. Anonymised training data, as well as AI models trained from this data, pose the threat of being freely reused in potentially harmful contexts such as insurance risk scoring and automated job applicant screening. We propose shifting the focus of purpose limitation from data processing to AI model regulation. This approach mandates that those training AI models define the intended purpose and restrict the use of the model solely to this stated purpose…(More)”.

Make privacy policies longer and appoint LLM readers


Paper by Przemysław Pałka et al: “In a world of human-only readers, a trade-off persists between comprehensiveness and comprehensibility: only privacy policies too long to be humanly readable can precisely describe the intended data processing. We argue that this trade-off no longer exists where LLMs are able to extract tailored information from clearly-drafted fully-comprehensive privacy policies. To substantiate this claim, we provide a methodology for drafting comprehensive non-ambiguous privacy policies and for querying them using LLMs prompts. Our methodology is tested with an experiment aimed at determining to what extent GPT-4 and Llama2 are able to answer questions regarding the content of privacy policies designed in the format we propose. We further support this claim by analyzing real privacy policies in the chosen market sectors through two experiments (one with legal experts, and another by using LLMs). Based on the success of our experiments, we submit that data protection law should change: it must require controllers to provide clearly drafted, fully comprehensive privacy policies from which data subjects and other actors can extract the needed information, with the help of LLMs…(More)”.

Artificial Intelligence and Big Data


Book edited by Frans L. Leeuw and Michael Bamberger: “…explores how Artificial Intelligence (AI) and Big Data contribute to the evaluation of the rule of law (covering legal arrangements, empirical legal research, law and technology, and international law), and social and economic development programs in both industrialized and developing countries. Issues of ethics and bias in the use of AI are also addressed and indicators of the growth of knowledge in the field are discussed.

Interdisciplinary and international in scope, and bringing together leading academics and practitioners from across the globe, the book explores the applications of AI and big data in Rule of Law and development evaluation, identifies differences in the approaches used in the two fields, and how each could learn from the approaches used in the other, as well as differences in the AI-related issues addressed in industrialized nations compared to those addressed in Africa and Asia.

Artificial Intelligence and Big Data is an essential read for researchers, academics and students working in the fields of Rule of Law and Development, and researchers in institutions working on new applications in AI will all benefit from the book’s practical insights…(More)”.

UAE set to use AI to write laws in world first


Article by Chloe Cornish: “The United Arab Emirates aims to use AI to help write new legislation and review and amend existing laws, in the Gulf state’s most radical attempt to harness a technology into which it has poured billions.

The plan for what state media called “AI-driven regulation” goes further than anything seen elsewhere, AI researchers said, while noting that details were scant. Other governments are trying to use AI to become more efficient, from summarising bills to improving public service delivery, but not to actively suggest changes to current laws by crunching government and legal data.

“This new legislative system, powered by artificial intelligence, will change how we create laws, making the process faster and more precise,” said Sheikh Mohammad bin Rashid Al Maktoum, the Dubai ruler and UAE vice-president, quoted by state media.

Ministers last week approved the creation of a new cabinet unit, the Regulatory Intelligence Office, to oversee the legislative AI push. 

Rony Medaglia, a professor at Copenhagen Business School, said the UAE appeared to have an “underlying ambition to basically turn AI into some sort of co-legislator”, and described the plan as “very bold”.

Abu Dhabi has bet heavily on AI and last year opened a dedicated investment vehicle, MGX, which has backed a $30bn BlackRock AI-infrastructure fund among other investments. MGX has also added an AI observer to its own board.

The UAE plans to use AI to track how laws affect the country’s population and economy by creating a massive database of federal and local laws, together with public sector data such as court judgments and government services.

The AI will “regularly suggest updates to our legislation,” Sheikh Mohammad said, according to state media. The government expects AI to speed up lawmaking by 70 per cent, according to the cabinet meeting readout…(More)”

AI Liability Along the Value Chain


Report by Beatriz Botero Arcila: “…explores how liability law can help solve the “problem of many hands” in AI: that is, determining who is responsible for harm that has been dealt in a value chain in which a variety of different companies and actors might be contributing to the development of any given AI system. This is aggravated by the fact that AI systems are both opaque and technically complex, making their behavior hard to predict.

Why AI Liability Matters

To find meaningful solutions to this problem, different kinds of experts have to come together. This resource is designed for a wide audience, but we indicate how specific audiences can best make use of different sections, overviews, and case studies.

Specifically, the report:

  • Proposes a 3-step analysis to consider how liability should be allocated along the value chain: 1) The choice of liability regime, 2) how liability should be shared amongst actors along the value chain and 3) whether and how information asymmetries will be addressed.
  • Argues that where ex-ante AI regulation is already in place, policymakers should consider how liability rules will interact with these rules.
  • Proposes a baseline liability regime where actors along the AI value chain share responsibility if fault can be demonstrated, paired with measures to alleviate or shift the burden of proof and to enable better access to evidence — which would incentivize companies to act with sufficient care and address information asymmetries between claimants and companies.
  • Argues that in some cases, courts and regulators should extend a stricter regime, such as product liability or strict liability.
  • Analyzes liability rules in the EU based on this framework…(More)”.

The Cambridge Handbook of the Law, Ethics and Policy of Artificial Intelligence


Handbook edited by Nathalie A. Smuha: “…provides a comprehensive overview of the legal, ethical, and policy implications of AI and algorithmic systems. As these technologies continue to impact various aspects of our lives, it is crucial to understand and assess the challenges and opportunities they present. Drawing on contributions from experts in various disciplines, the book covers theoretical insights and practical examples of how AI systems are used in society today. It also explores the legal and policy instruments governing AI, with a focus on Europe. The interdisciplinary approach of this book makes it an invaluable resource for anyone seeking to gain a deeper understanding of AI’s impact on society and how it should be regulated…(More)”.

Harvard Is Releasing a Massive Free AI Training Dataset Funded by OpenAI and Microsoft


Article by Kate Knibbs: “Harvard University announced Thursday it’s releasing a high-quality dataset of nearly 1 million public-domain books that could be used by anyone to train large language models and other AI tools. The dataset was created by Harvard’s newly formed Institutional Data Initiative with funding from both Microsoft and OpenAI. It contains books scanned as part of the Google Books project that are no longer protected by copyright.

Around five times the size of the notorious Books3 dataset that was used to train AI models like Meta’s Llama, the Institutional Data Initiative’s database spans genres, decades, and languages, with classics from Shakespeare, Charles Dickens, and Dante included alongside obscure Czech math textbooks and Welsh pocket dictionaries. Greg Leppert, executive director of the Institutional Data Initiative, says the project is an attempt to “level the playing field” by giving the general public, including small players in the AI industry and individual researchers, access to the sort of highly-refined and curated content repositories that normally only established tech giants have the resources to assemble. “It’s gone through rigorous review,” he says…(More)”.