OECD Report: “Recent technological advances in artificial intelligence (AI), especially the rise of generative AI, have raised questions regarding the intellectual property (IP) landscape. As the demand for AI training data surges, certain data collection methods give rise to concerns about the protection of IP and other rights. This report provides an overview of key issues at the intersection of AI and some IP rights. It aims to facilitate a greater understanding of data scraping — a primary method for obtaining AI training data needed to develop many large language models. It analyses data scraping techniques, identifies key stakeholders, and worldwide legal and regulatory responses. Finally, it offers preliminary considerations and potential policy approaches to help guide policymakers in navigating these issues, ensuring that AI’s innovative potential is unleashed while protecting IP and other rights…(More)”.
Building AI for the pluralistic society
Paper by Aida Davani and Vinodkumar Prabhakaran: “Modern artificial intelligence (AI) systems rely on input from people. Human feedback helps train models to perform useful tasks, guides them toward safe and responsible behavior, and is used to assess their performance. While hailing the recent AI advancements, we should also ask: which humans are we actually talking about? For AI to be most beneficial, it should reflect and respect the diverse tapestry of values, beliefs, and perspectives present in the pluralistic world in which we live, not just a single “average” or majority viewpoint. Diversity in perspectives is especially relevant when AI systems perform subjective tasks, such as deciding whether a response will be perceived as helpful, offensive, or unsafe. For instance, what one value system deems as offensive may be perfectly acceptable within another set of values.
Since divergence in perspectives often aligns with socio-cultural and demographic lines, preferentially capturing certain groups’ perspectives over others in data may result in disparities in how well AI systems serve different social groups. For instance, we previously demonstrated that simply taking a majority vote from human annotations may obfuscate valid divergence in perspectives across social groups, inadvertently marginalizing minority perspectives, and consequently performing less reliably for groups marginalized in the data. How AI systems should deal with such diversity in perspectives depends on the context in which they are used. However, current models lack a systematic way to recognize and handle such contexts.
With this in mind, here we describe our ongoing efforts in pursuit of capturing diverse perspectives and building AI for the pluralistic society in which we live… (More)”.

AI crawler wars threaten to make the web more closed for everyone
Article by Shayne Longpre: “We often take the internet for granted. It’s an ocean of information at our fingertips—and it simply works. But this system relies on swarms of “crawlers”—bots that roam the web, visit millions of websites every day, and report what they see. This is how Google powers its search engines, how Amazon sets competitive prices, and how Kayak aggregates travel listings. Beyond the world of commerce, crawlers are essential for monitoring web security, enabling accessibility tools, and preserving historical archives. Academics, journalists, and civil societies also rely on them to conduct crucial investigative research.
Crawlers are endemic. Now representing half of all internet traffic, they will soon outpace human traffic. This unseen subway of the web ferries information from site to site, day and night. And as of late, they serve one more purpose: Companies such as OpenAI use web-crawled data to train their artificial intelligence systems, like ChatGPT.
Understandably, websites are now fighting back for fear that this invasive species—AI crawlers—will help displace them. But there’s a problem: This pushback is also threatening the transparency and open borders of the web, that allow non-AI applications to flourish. Unless we are thoughtful about how we fix this, the web will increasingly be fortified with logins, paywalls, and access tolls that inhibit not just AI but the biodiversity of real users and useful crawlers…(More)”.
Economic Implications of Data Regulation
OECD Report: “Cross-border data flows are the lifeblood of today’s social and economic interactions, but they also raise a range of new challenges, including for privacy and data protection, national security, cybersecurity, digital protectionism and regulatory reach. This has led to a surge in regulation conditioning (or prohibiting) its flow or mandating that data be stored or processed domestically (data localisation). However, the economic implications of these measures are not well understood. This report provides estimates on what is at stake, highlighting that full fragmentation could reduce global GDP by 4.5%. It also underscores the benefits associated with open regimes with safeguards which could see global GDP increase by 1.7%. In a world where digital fragmentation is growing, global discussions on these issues can help harness the benefits of an open and safeguarded internet…(More)”.
Sandboxes for AI
Report by Datasphere Initiative: “The Sandboxes for AI report explores the role of regulatory sandboxes in the development and governance of artificial intelligence. Originally presented as a working paper at the Global Sandbox Forum Inaugural Meeting in July 2024, the report was further refined through expert consultations and an online roundtable in December 2024. It examines sandboxes that have been announced, are under development, or have been completed, identifying common patterns in their creation, timing, and implementation. By providing insights into why and how regulators and companies should consider AI sandboxes, the report serves as a strategic guide for fostering responsible innovation.
In a rapidly evolving AI landscape, traditional regulatory processes often struggle to keep pace with technological advancements. Sandboxes offer a flexible and iterative approach, allowing policymakers to test and refine AI governance models in a controlled environment. The report identifies 66 AI, data, or technology-related sandboxes globally, with 31 specifically designed for AI innovation across 44 countries. These initiatives focus on areas such as machine learning, data-driven solutions, and AI governance, helping policymakers address emerging challenges while ensuring ethical and transparent AI development…(More)”.
Google-backed public interest AI partnership launches with $400M+ for open ecosystem building
Article by Natasha Lomas: “Make room for yet another partnership on AI. Current AI, a “public interest” initiative focused on fostering and steering development of artificial intelligence in societally beneficial directions, was announced at the French AI Action summit on Monday. It’s kicking off with an initial $400 million in pledges from backers and a plan to pull in $2.5 billion more over the next five years.
Such figures might are small beer when it comes to AI investment, with the French president fresh from trumpeting a private support package worth around $112 billion (which itself pales beside U.S. investments of $500 billion aiming to accelerate the tech). But the partnership is not focused on compute, so its backers believe such relatively modest sums will still be able to produce an impact in key areas where AI could make a critical difference to advancing the public interest in areas like healthcare and supporting climate goals.
The initial details are high level. Under the top-line focus on “the enabling environment for public interest AI,” the initiative has a number of stated aims — including pushing to widen access to “high quality” public and private datasets for AI training; support for open source infrastructure and tooling to boost AI transparency and security; and support for developing systems to measure AI’s social and environmental impact.
Its founder, Martin Tisné, said the goal is to create a financial vehicle “to provide a North Star for public financing of critical efforts,” such as bringing AI to bear on combating cancers or coming up with treatments for long COVID.
“I think what’s happening is you’ve got a data bottleneck coming in artificial intelligence, because we’re running out of road with data on the web, effectively … and here, what we need is to really unlock innovations in how to make data accessible and available,” he told TechCrunch….(More)”
Trump’s shocking purge of public health data, explained
Article by Dylan Scott: “In the initial days of the Trump administration, officials scoured federal websites for any mention of what they deemed “DEI” keywords — terms as generic as “diverse” and “historically” and even “women.” They soon identified reams of some of the country’s most valuable public health data containing some of the targeted words, including language about LGBTQ+ people, and quickly took down much of it — from surveys on obesity and suicide rates to real-time reports on immediate infectious disease threats like bird flu.
The removal elicited a swift response from public health experts who warned that without this data, the country risked being in the dark about important health trends that shape life-and-death public health decisions made in communities across the country.
Some of this data was restored in a matter of days, but much of it was incomplete. In some cases, the raw data sheets were posted again, but the reference documents that would allow most people to decipher them were not. Meanwhile, health data continues to be taken down: The New York Times reported last week that data from the Centers for Disease Control and Prevention on bird flu transmission between humans and cats had been posted and then promptly removed…
It is difficult to capture the sheer breadth and importance of the public health data that has been affected. Here are a few illustrative examples of reports that have either been tampered with or removed completely, as compiled by KFF.
The Behavioral Risk Factor Surveillance System (BRFSS), which is “one of the most widely used national health surveys and has been ongoing for about 40 years,” per KFF, is an annual survey that contacts 400,000 Americans to ask people about everything from their own perception of their general health to exercise, diet, sexual activity, and alcohol and drug use.
That in turn allows experts to track important health trends, like the fluctuations in teen vaping use. One recent study that relied on BRFSS data warned that a recent ban on flavored e-cigarettes (also known as vapes) may be driving more young people to conventional smoking, five years after an earlier Yale study based on the same survey led to the ban being proposed in the first place. The Supreme Court and the Trump administration are currently revisiting the flavored vape ban, and the Yale study was cited in at least one amicus brief for the case.
This survey has also been of particular use in identifying health disparities among LGBTQ+ people, such as higher rates of uninsurance and reported poor health compared to the general population. Those findings have motivated policymakers at the federal, state and local levels to launch new initiatives aimed specifically at that at-risk population.
As of now, most of the BRFSS data has been restored, but the supplemental materials that make it legible to lay people still has not…(More)”.
Digital Data and Advanced AI for Richer Global Intelligence
Report by Danielle Goldfarb: “From collecting millions of online price data to measure inflation, to assessing the economic impact of the COVID-19 pandemic on low-income workers, digital data sets can be used to benefit the public interest. Using these and other examples, this special report explores how digital data sets and advances in artificial intelligence (AI) can provide timely, transparent and detailed insights into global challenges. These experiments illustrate how governments and civil society analysts can reuse digital data to spot emerging problems, analyze specific group impacts, complement traditional metrics or verify data that may be manipulated. AI and data governance should extend beyond addressing harms. International institutions and governments need to actively steward digital data and AI tools to support a step change in our understanding of society’s biggest challenges…(More)”
Recommendations for Better Sharing of Climate Data
Creative Commons: “…the culmination of a nine-month research initiative from our Open Climate Data project. These guidelines are a result of collaboration between Creative Commons, government agencies and intergovernmental organizations. They mark a significant milestone in our ongoing effort to enhance the accessibility, sharing, and reuse of open climate data to address the climate crisis. Our goal is to share strategies that align with existing data sharing principles and pave the way for a more interconnected and accessible future for climate data.
Our recommendations offer practical steps and best practices, crafted in collaboration with key stakeholders and organizations dedicated to advancing open practices in climate data. We provide recommendations for 1) legal and licensing terms, 2) using metadata values for attribution and provenance, and 3) management and governance for better sharing.
Opening climate data requires an examination of the public’s legal rights to access and use the climate data, often dictated by copyright and licensing. This legal detail is sometimes missing from climate data sharing and legal interoperability conversations. Our recommendations suggest two options: Option A: CC0 + Attribution Request, in order to maximize reuse by dedicating climate data to the public domain, plus a request for attribution; and Option B: CC BY 4.0, for retaining data ownership and legal enforcement of attribution. We address how to navigate license stacking and attribution stacking for climate data hosts and for users working with multiple climate data sources.
We also propose standardized human- and machine-readable metadata values that enhance transparency, reduce guesswork, and ensure broader accessibility to climate data. We built upon existing model metadata schemas and standards, including those that address license and attribution information. These recommendations address a gap and provide metadata schema that standardize the inclusion of upfront, clear values related to attribution, licensing and provenance.
Lastly, we highlight four key aspects of effective climate data management: designating a dedicated technical managing steward, designating a legal and/or policy steward, encouraging collaborative data sharing, and regularly revisiting and updating data sharing policies in accordance with parallel open data policies and standards…(More)”.
It’s just distributed computing: Rethinking AI governance
Paper by Milton L. Mueller: “What we now lump under the unitary label “artificial intelligence” is not a single technology, but a highly varied set of machine learning applications enabled and supported by a globally ubiquitous system of distributed computing. The paper introduces a 4 part conceptual framework for analyzing the structure of that system, which it labels the digital ecosystem. What we now call “AI” is then shown to be a general functionality of distributed computing. “AI” has been present in primitive forms from the origins of digital computing in the 1950s. Three short case studies show that large-scale machine learning applications have been present in the digital ecosystem ever since the rise of the Internet. and provoked the same public policy concerns that we now associate with “AI.” The governance problems of “AI” are really caused by the development of this digital ecosystem, not by LLMs or other recent applications of machine learning. The paper then examines five recent proposals to “govern AI” and maps them to the constituent elements of the digital ecosystem model. This mapping shows that real-world attempts to assert governance authority over AI capabilities requires systemic control of all four elements of the digital ecosystem: data, computing power, networks and software. “Governing AI,” in other words, means total control of distributed computing. A better alternative is to focus governance and regulation upon specific applications of machine learning. An application-specific approach to governance allows for a more decentralized, freer and more effective method of solving policy conflicts…(More)”