The Emergent Landscape of Data Commons: A Brief Survey and Comparison of Existing Initiatives


Article by Stefaan G. Verhulst and Hannah Chafetz: With the increased attention on the need for data to advance AI, data commons initiatives around the world are redefining how data can be accessed, and re-used for societal benefit. These initiatives focus on generating access to data from various sources for a public purpose and are governed by communities themselves. While diverse in focus–from health and mobility to language and environmental data–data commons are united by a common goal: democratizing access to data to fuel innovation and tackle global challenges.

This includes innovation in the context of artificial intelligence (AI). Data commons are providing the framework to make pools of diverse data available in machine understandable formats for responsible AI development and deployment. By providing access to high quality data sources with open licensing, data commons can help increase the quantity of training data in a less exploitative fashion, minimize AI providers’ reliance on data extracted across the internet without an open license, and increase the quality of the AI output (while reducing mis-information).

Over the last few months, the Open Data Policy Lab (a collaboration between The GovLab and Microsoft) has conducted various research initiatives to explore these topics further and understand:

(1) how the concept of a data commons is changing in the context of artificial intelligence, and

(2) current efforts to advance the next generation of data commons.

In what follows we provide a summary of our findings thus far. We hope it inspires more data commons use cases for responsible AI innovation in the public’s interest…(More)”.

Two Open Science Foundations: Data Commons and Stewardship as Pillars for Advancing the FAIR Principles and Tackling Planetary Challenges


Article by Stefaan Verhulst and Jean Claude Burgelman: “Today the world is facing three major planetary challenges: war and peace, steering Artificial Intelligence and making the planet a healthy Anthropoceen. As they are closely interrelated, they represent an era of “polycrisis”, to use the term Adam Tooze has coined. There are no simple solutions or quick fixes to these (and other) challenges; their interdependencies demand a multi-stakeholder, interdisciplinary approach.

As world leaders and experts convene in Baku for The 29th session of the Conference of the Parties to the United Nations Framework Convention on Climate Change (COP29), the urgency of addressing these global crises has never been clearer. A crucial part of addressing these challenges lies in advancing science — particularly open science, underpinned by data made available leveraging the FAIR principles (Findable, Accessible, Interoperable, and Reusable). In this era of computation, the transformative potential of research depends on the seamless flow and reuse of high-quality data to unlock breakthrough insights and solutions. Ensuring data is available in reusable, interoperable formats not only accelerates the pace of scientific discovery but also expedites the search for solutions to global crises.

Image of the retreat of the Columbia glacier by Jesse Allen, using Landsat data from the U.S. Geological Survey. Free to re-use from NASA Visible Earth.

While FAIR principles provide a vital foundation for making data accessible, interoperable and reusable, translating these principles into practice requires robust institutional approaches. Toward that end, in the below, we argue two foundational pillars must be strengthened:

  • Establishing Data Commons: The need for shared data ecosystems where resources can be pooled, accessed, and re-used collectively, breaking down silos and fostering cross-disciplinary collaboration.
  • Enabling Data Stewardship: Systematic and responsible data reuse requires more than access; it demands stewardship — equipping institutions and scientists with the capabilities to maximize the value of data while safeguarding its responsible use is essential…(More)”.

A Second Academic Exodus From X?


Article by Josh Moody: “Two years ago, after Elon Musk bought Twitter for $44 billion, promptly renaming it X, numerous academics decamped from the platform. Now, in the wake of a presidential election fraught with online disinformation, a second exodus from the social media site appears underway.

Academics, including some with hundreds of thousands of followers, announced departures from the platform in the immediate aftermath of the election, decrying the toxicity of the website and objections to Musk and how he wielded the platform to back President-elect Donald Trump. The business mogul threw millions of dollars behind Trump and personally campaigned for him this fall. Musk also personally advanced various debunked conspiracy theories during the election cycle.

Amid another wave of exits, some users see this as the end of Academic Twitter, which was already arguably in its death throes…

LeBlanc, Kamola and Rosen all mentioned that they were moving to the platform Bluesky, which has grown to 14.5 million users, welcoming more than 700,000 new accounts in recent days. In September, Bluesky had nine million users…

A study published in PS: Political Science & Politics last month concluded that academics began to engage less after Musk bought the platform. But the peak of disengagement wasn’t when the billionaire took over the site in October 2022 but rather the next month, when he reinstated Donald Trump’s account, which the platform’s previous owners deactivated following the Jan. 6, 2021, insurrection, which he encouraged.

The researchers reviewed 15,700 accounts from academics in economics, political science, sociology and psychology for their study.

James Bisbee, a political science professor at Vanderbilt University and article co-author, wrote via email that changes to the platform, particularly to the application programming interface, or API, undermined their ability to collect data for their research.

“Twitter used to be an amazing source of data for political scientists (and social scientists more broadly) thanks in part to its open data ethos,” Bisbee wrote. “Since Musk’s takeover, this is no longer the case, severely limiting the types of conclusions we could draw, and theories we could test, on this platform.”

To Bisbee, that loss is an understated issue: “Along with many other troubling developments on X since the change in ownership, the amputation of data access should not be ignored.”..(More)”

The Death of Search


Article by Matteo Wong: “For nearly two years, the world’s biggest tech companies have said that AI will transform the web, your life, and the world. But first, they are remaking the humble search engine.

Chatbots and search, in theory, are a perfect match. A standard Google search interprets a query and pulls up relevant results; tech companies have spent tens or hundreds of millions of dollars engineering chatbots that interpret human inputs, synthesize information, and provide fluent, useful responses. No more keyword refining or scouring Wikipedia—ChatGPT will do it all. Search is an appealing target, too: Shaping how people navigate the internet is tantamount to shaping the internet itself.

Months of prophesying about generative AI have now culminated, almost all at once, in what may be the clearest glimpse yet into the internet’s future. After a series of limited releases and product demos, mired with various setbacks and embarrassing errors, tech companies are debuting AI-powered search engines as fully realized, all-inclusive products. Last Monday, Google announced that it would launch its AI Overviews in more than 100 new countries; that feature will now reach more than 1 billion users a month. Days later, OpenAI announced a new search function in ChatGPT, available to paid users for now and soon opening to the public. The same afternoon, the AI-search start-up Perplexity shared instructions for making its “answer engine” the default search tool in your web browser.

For the past week, I have been using these products in a variety of ways: to research articles, follow the election, and run everyday search queries. In turn I have scried, as best I can, into the future of how billions of people will access, relate to, and synthesize information. What I’ve learned is that these products are at once unexpectedly convenient, frustrating, and weird. These tools’ current iterations surprised and, at times, impressed me, yet even when they work perfectly, I’m not convinced that AI search is a wise endeavor…(More)”.

Who Is Responsible for AI Copyright Infringement?


Article by Michael P. Goodyear: “Twenty-one-year-old college student Shane hopes to write a song for his boyfriend. In the past, Shane would have had to wait for inspiration to strike, but now he can use generative artificial intelligence to get a head start. Shane decides to use Anthropic’s AI chat system, Claude, to write the lyrics. Claude dutifully complies and creates the words to a love song. Shane, happy with the result, adds notes, rhythm, tempo, and dynamics. He sings the song and his boyfriend loves it. Shane even decides to post a recording to YouTube, where it garners 100,000 views.

But Shane did not realize that this song’s lyrics are similar to those of “Love Story,” Taylor Swift’s hit 2008 song. Shane must now contend with copyright law, which protects original creative expression such as music. Copyright grants the rights owner the exclusive rights to reproduce, perform, and create derivatives of the copyrighted work, among other things. If others take such actions without permission, they can be liable for damages up to $150,000. So Shane could be on the hook for tens of thousands of dollars for copying Swift’s song.

Copyright law has surged into the news in the past few years as one of the most important legal challenges for generative AI tools like Claude—not for the output of these tools but for how they are trained. Over two dozen pending court cases grapple with the question of whether training generative AI systems on copyrighted works without compensating or getting permission from the creators is lawful or not. Answers to this question will shape a burgeoning AI industry that is predicted to be worth $1.3 trillion by 2032.

Yet there is another important question that few have asked: Who should be liable when a generative AI system creates a copyright-infringing output? Should the user be on the hook?…(More)”

From Digital Sovereignty to Digital Agency


Article by Akash Kapur: “In recent years, governments have increasingly pursued variants of digital sovereignty to regulate and control the global digital ecosystem. The pursuit of AI sovereignty represents the latest iteration in this quest. 

Digital sovereignty may offer certain benefits, but it also poses undeniable risks, including the possibility of undermining the very goals of autonomy and self-reliance that nations are seeking. These risks are particularly pronounced for smaller nations with less capacity, which might do better in a revamped, more inclusive, multistakeholder system of digital governance. 

Organizing digital governance around agency rather than sovereignty offers the possibility of such a system. Rather than reinforce the primacy of nations, digital agency asserts the rights, priorities, and needs not only of sovereign governments but also of the constituent parts—the communities and individuals—they purport to represent.

Three cross-cutting principles underlie the concept of digital agency: recognizing stakeholder multiplicity, enhancing the latent possibilities of technology, and promoting collaboration. These principles lead to three action-areas that offer a guide for digital policymakers: reinventing institutions, enabling edge technologies, and building human capacity to ensure technical capacity…(More)”.

OECD Digital Economy Outlook 2024


OECD Report: “The most recent phase of digital transformation is marked by rapid technological changes, creating both opportunities and risks for the economy and society. The Volume 2 of the OECD Digital Economy Outlook 2024 explores emerging priorities, policies and governance practices across countries. It also examines trends in the foundations that enable digital transformation, drive digital innovation and foster trust in the digital age. The volume concludes with a statistical annex…

In 2023, digital government, connectivity and skills topped the list of digital policy priorities. Increasingly developed at a high level of government, national digital strategies play a critical role in co-ordinating these efforts. Nearly half of the 38 countries surveyed develop these strategies through dedicated digital ministries, up from just under a quarter in 2016. Among 1 200 policy initiatives tracked across the OECD, one-third aim to boost digital technology adoption, social prosperity, and innovation. AI and 5G are the most often-cited technologies…(More)”

Human-AI coevolution


Paper by Dino Pedreschi et al: “Human-AI coevolution, defined as a process in which humans and AI algorithms continuously influence each other, increasingly characterises our society, but is understudied in artificial intelligence and complexity science literature. Recommender systems and assistants play a prominent role in human-AI coevolution, as they permeate many facets of daily life and influence human choices through online platforms. The interaction between users and AI results in a potentially endless feedback loop, wherein users’ choices generate data to train AI models, which, in turn, shape subsequent user preferences. This human-AI feedback loop has peculiar characteristics compared to traditional human-machine interaction and gives rise to complex and often “unintended” systemic outcomes. This paper introduces human-AI coevolution as the cornerstone for a new field of study at the intersection between AI and complexity science focused on the theoretical, empirical, and mathematical investigation of the human-AI feedback loop. In doing so, we: (i) outline the pros and cons of existing methodologies and highlight shortcomings and potential ways for capturing feedback loop mechanisms; (ii) propose a reflection at the intersection between complexity science, AI and society; (iii) provide real-world examples for different human-AI ecosystems; and (iv) illustrate challenges to the creation of such a field of study, conceptualising them at increasing levels of abstraction, i.e., scientific, legal and socio-political…(More)”.

What is ‘sovereign AI’ and why is the concept so appealing (and fraught)?


Article by John Letzing: “Denmark unveiled its own artificial intelligence supercomputer last month, funded by the proceeds of wildly popular Danish weight-loss drugs like Ozempic. It’s now one of several sovereign AI initiatives underway, which one CEO believes can “codify” a country’s culture, history, and collective intelligence – and become “the bedrock of modern economies.”

That particular CEO, Jensen Huang, happens to run a company selling the sort of chips needed to pursue sovereign AI – that is, to construct a domestic vintage of the technology, informed by troves of homegrown data and powered by the computing infrastructure necessary to turn that data into a strategic reserve of intellect…

It’s not surprising that countries are forging expansive plans to put their own stamp on AI. But big-ticket supercomputers and other costly resources aren’t feasible everywhere.

Training a large language model has gotten a lot more expensive lately; the funds required for the necessary hardware, energy, and staff may soon top $1 billion. Meanwhile, geopolitical friction over access to the advanced chips necessary for powerful AI systems could further warp the global playing field.

Even for countries with abundant resources and access, there are “sovereignty traps” to consider. Governments pushing ahead on sovereign AI could risk undermining global cooperation meant to ensure the technology is put to use in transparent and equitable ways. That might make it a lot less safe for everyone.

An example: a place using AI systems trained on a local set of values for its security may readily flag behaviour out of sync with those values as a threat…(More)”.

Engaging publics in science: a practical typology


Paper by Heather Douglas et al: “Public engagement with science has become a prominent area of research and effort for democratizing science. In the fall of 2020, we held an online conference, Public Engagement with Science: Defining and Measuring Success, to address questions of how to do public engagement well. The conference was organized around conceptualizations of the publics engaged, with attendant epistemic, ethical, and political valences. We present here the typology of publics we used (volunteer, representative sample, stakeholder, and community publics), discuss the differences among those publics and what those differences mean for practice, and situate this typology within the existing work on public engagement with science. We then provide an overview of the essays published in this journal arising from the conference which provides a window into the rich work presented at the event…(More)”.