The Emergent Landscape of Data Commons: A Brief Survey and Comparison of Existing Initiatives


Article by Stefaan G. Verhulst and Hannah Chafetz: With the increased attention on the need for data to advance AI, data commons initiatives around the world are redefining how data can be accessed, and re-used for societal benefit. These initiatives focus on generating access to data from various sources for a public purpose and are governed by communities themselves. While diverse in focus–from health and mobility to language and environmental data–data commons are united by a common goal: democratizing access to data to fuel innovation and tackle global challenges.

This includes innovation in the context of artificial intelligence (AI). Data commons are providing the framework to make pools of diverse data available in machine understandable formats for responsible AI development and deployment. By providing access to high quality data sources with open licensing, data commons can help increase the quantity of training data in a less exploitative fashion, minimize AI providers’ reliance on data extracted across the internet without an open license, and increase the quality of the AI output (while reducing mis-information).

Over the last few months, the Open Data Policy Lab (a collaboration between The GovLab and Microsoft) has conducted various research initiatives to explore these topics further and understand:

(1) how the concept of a data commons is changing in the context of artificial intelligence, and

(2) current efforts to advance the next generation of data commons.

In what follows we provide a summary of our findings thus far. We hope it inspires more data commons use cases for responsible AI innovation in the public’s interest…(More)”.

Two Open Science Foundations: Data Commons and Stewardship as Pillars for Advancing the FAIR Principles and Tackling Planetary Challenges


Article by Stefaan Verhulst and Jean Claude Burgelman: “Today the world is facing three major planetary challenges: war and peace, steering Artificial Intelligence and making the planet a healthy Anthropoceen. As they are closely interrelated, they represent an era of “polycrisis”, to use the term Adam Tooze has coined. There are no simple solutions or quick fixes to these (and other) challenges; their interdependencies demand a multi-stakeholder, interdisciplinary approach.

As world leaders and experts convene in Baku for The 29th session of the Conference of the Parties to the United Nations Framework Convention on Climate Change (COP29), the urgency of addressing these global crises has never been clearer. A crucial part of addressing these challenges lies in advancing science — particularly open science, underpinned by data made available leveraging the FAIR principles (Findable, Accessible, Interoperable, and Reusable). In this era of computation, the transformative potential of research depends on the seamless flow and reuse of high-quality data to unlock breakthrough insights and solutions. Ensuring data is available in reusable, interoperable formats not only accelerates the pace of scientific discovery but also expedites the search for solutions to global crises.

Image of the retreat of the Columbia glacier by Jesse Allen, using Landsat data from the U.S. Geological Survey. Free to re-use from NASA Visible Earth.

While FAIR principles provide a vital foundation for making data accessible, interoperable and reusable, translating these principles into practice requires robust institutional approaches. Toward that end, in the below, we argue two foundational pillars must be strengthened:

  • Establishing Data Commons: The need for shared data ecosystems where resources can be pooled, accessed, and re-used collectively, breaking down silos and fostering cross-disciplinary collaboration.
  • Enabling Data Stewardship: Systematic and responsible data reuse requires more than access; it demands stewardship — equipping institutions and scientists with the capabilities to maximize the value of data while safeguarding its responsible use is essential…(More)”.

A Second Academic Exodus From X?


Article by Josh Moody: “Two years ago, after Elon Musk bought Twitter for $44 billion, promptly renaming it X, numerous academics decamped from the platform. Now, in the wake of a presidential election fraught with online disinformation, a second exodus from the social media site appears underway.

Academics, including some with hundreds of thousands of followers, announced departures from the platform in the immediate aftermath of the election, decrying the toxicity of the website and objections to Musk and how he wielded the platform to back President-elect Donald Trump. The business mogul threw millions of dollars behind Trump and personally campaigned for him this fall. Musk also personally advanced various debunked conspiracy theories during the election cycle.

Amid another wave of exits, some users see this as the end of Academic Twitter, which was already arguably in its death throes…

LeBlanc, Kamola and Rosen all mentioned that they were moving to the platform Bluesky, which has grown to 14.5 million users, welcoming more than 700,000 new accounts in recent days. In September, Bluesky had nine million users…

A study published in PS: Political Science & Politics last month concluded that academics began to engage less after Musk bought the platform. But the peak of disengagement wasn’t when the billionaire took over the site in October 2022 but rather the next month, when he reinstated Donald Trump’s account, which the platform’s previous owners deactivated following the Jan. 6, 2021, insurrection, which he encouraged.

The researchers reviewed 15,700 accounts from academics in economics, political science, sociology and psychology for their study.

James Bisbee, a political science professor at Vanderbilt University and article co-author, wrote via email that changes to the platform, particularly to the application programming interface, or API, undermined their ability to collect data for their research.

“Twitter used to be an amazing source of data for political scientists (and social scientists more broadly) thanks in part to its open data ethos,” Bisbee wrote. “Since Musk’s takeover, this is no longer the case, severely limiting the types of conclusions we could draw, and theories we could test, on this platform.”

To Bisbee, that loss is an understated issue: “Along with many other troubling developments on X since the change in ownership, the amputation of data access should not be ignored.”..(More)”

The Death of Search


Article by Matteo Wong: “For nearly two years, the world’s biggest tech companies have said that AI will transform the web, your life, and the world. But first, they are remaking the humble search engine.

Chatbots and search, in theory, are a perfect match. A standard Google search interprets a query and pulls up relevant results; tech companies have spent tens or hundreds of millions of dollars engineering chatbots that interpret human inputs, synthesize information, and provide fluent, useful responses. No more keyword refining or scouring Wikipedia—ChatGPT will do it all. Search is an appealing target, too: Shaping how people navigate the internet is tantamount to shaping the internet itself.

Months of prophesying about generative AI have now culminated, almost all at once, in what may be the clearest glimpse yet into the internet’s future. After a series of limited releases and product demos, mired with various setbacks and embarrassing errors, tech companies are debuting AI-powered search engines as fully realized, all-inclusive products. Last Monday, Google announced that it would launch its AI Overviews in more than 100 new countries; that feature will now reach more than 1 billion users a month. Days later, OpenAI announced a new search function in ChatGPT, available to paid users for now and soon opening to the public. The same afternoon, the AI-search start-up Perplexity shared instructions for making its “answer engine” the default search tool in your web browser.

For the past week, I have been using these products in a variety of ways: to research articles, follow the election, and run everyday search queries. In turn I have scried, as best I can, into the future of how billions of people will access, relate to, and synthesize information. What I’ve learned is that these products are at once unexpectedly convenient, frustrating, and weird. These tools’ current iterations surprised and, at times, impressed me, yet even when they work perfectly, I’m not convinced that AI search is a wise endeavor…(More)”.

Congress should designate an entity to oversee data security, GAO says


Article by Matt Bracken: “Federal agencies may need to rethink how they handle individuals’ personal data to protect their civil rights and civil liberties, a congressional watchdog said in a new report Tuesday.

Without federal guidance governing the protection of the public’s civil rights and liberties, agencies have pursued a patchwork system of policies tied to the collection, sharing and use of data, the Government Accountability Office said

To address that problem head-on, the GAO is recommending that Congress select “an appropriate federal entity” to produce guidance or regulations regarding data protection that would apply to all agencies, giving that entity “the explicit authority to make needed technical and policy choices or explicitly stating Congress’s own choices.”

That recommendation was formed after the GAO sent a questionnaire to all 24 Chief Financial Officers Act agencies asking for information about their use of emerging technologies and data capabilities and how they’re guaranteeing that personally identifiable information is safeguarded.

The GAO found that 16 of those CFO Act agencies have policies or procedures in place to protect civil rights and civil liberties with regard to data use, while the other eight have not taken steps to do the same.

The most commonly cited issues for agencies in their efforts to protect the civil rights and civil liberties of the public were “complexities in handling protections associated with new and emerging technologies” and “a lack of qualified staff possessing needed skills in civil rights, civil liberties, and emerging technologies.”

“Further, eight of the 24 agencies believed that additional government-wide law or guidance would strengthen consistency in addressing civil rights and civil liberties protections,” the GAO wrote. “One agency noted that such guidance could eliminate the hodge-podge approach to the governance of data and technology.”

All 24 CFO Act agencies have internal offices to “handle the protection of the public’s civil rights as identified in federal laws,” with much of that work centered on the handling of civil rights violations and related complaints. Four agencies — the departments of Defense, Homeland Security, Justice and Education — have offices to specifically manage civil liberty protections across their entire agencies. The other 20 agencies have mostly adopted a “decentralized approach to protecting civil liberties, including when collecting, sharing, and using data,” the GAO noted…(More)”.

Who Is Responsible for AI Copyright Infringement?


Article by Michael P. Goodyear: “Twenty-one-year-old college student Shane hopes to write a song for his boyfriend. In the past, Shane would have had to wait for inspiration to strike, but now he can use generative artificial intelligence to get a head start. Shane decides to use Anthropic’s AI chat system, Claude, to write the lyrics. Claude dutifully complies and creates the words to a love song. Shane, happy with the result, adds notes, rhythm, tempo, and dynamics. He sings the song and his boyfriend loves it. Shane even decides to post a recording to YouTube, where it garners 100,000 views.

But Shane did not realize that this song’s lyrics are similar to those of “Love Story,” Taylor Swift’s hit 2008 song. Shane must now contend with copyright law, which protects original creative expression such as music. Copyright grants the rights owner the exclusive rights to reproduce, perform, and create derivatives of the copyrighted work, among other things. If others take such actions without permission, they can be liable for damages up to $150,000. So Shane could be on the hook for tens of thousands of dollars for copying Swift’s song.

Copyright law has surged into the news in the past few years as one of the most important legal challenges for generative AI tools like Claude—not for the output of these tools but for how they are trained. Over two dozen pending court cases grapple with the question of whether training generative AI systems on copyrighted works without compensating or getting permission from the creators is lawful or not. Answers to this question will shape a burgeoning AI industry that is predicted to be worth $1.3 trillion by 2032.

Yet there is another important question that few have asked: Who should be liable when a generative AI system creates a copyright-infringing output? Should the user be on the hook?…(More)”

Launching the Data-Powered Positive Deviance Course


Blog by Robin Nowok: “Data-Powered Positive Deviance (DPPD) is a new method that combines the principles of Positive Deviance with the power of digital data and advanced analytics. Positive Deviance is based on the observation that in every community or organization, some individuals achieve significantly better outcomes than their peers, despite having similar challenges and resources. These individuals or groups are referred to as positive deviants.

The DPPD method follows the same logic as the Positive Deviance approach but leverages existing, non-traditional data sources, either instead of or in conjunction with traditional data sources. This allows for the identification of positive deviants on larger geographic and temporal scales. Once identified, we can then uncover the behaviors that lead to their success, enabling others to adopt these practices.

In a world where top-down solutions often fall short, DPPD offers a fresh perspective. It focuses on finding what’s already working within communities, rather than imposing external solutions. This can lead to more sustainable, culturally appropriate, and effective interventions.

Our online course is designed to get you started on your DPPD journey. Through five modules, you’ll gain both theoretical knowledge and practical skills to apply DPPD in your own work…(More)”.

Assessing potential future artificial intelligence risks, benefits and policy imperatives


OECD Report: “The swift evolution of AI technologies calls for policymakers to consider and proactively manage AI-driven change. The OECD’s Expert Group on AI Futures was established to help meet this need and anticipate AI developments and their potential impacts. Informed by insights from the Expert Group, this report distils research and expert insights on prospective AI benefits, risks and policy imperatives. It identifies ten priority benefits, such as accelerated scientific progress, productivity gains and better sense-making and forecasting. It discusses ten priority risks, such as facilitation of increasingly sophisticated cyberattacks; manipulation, disinformation, fraud and resulting harms to democracy; concentration of power; incidents in critical systems and exacerbated inequality and poverty. Finally, it points to ten policy priorities, including establishing clearer liability rules, drawing AI “red lines”, investing in AI safety and ensuring adequate risk management procedures. The report reviews existing public policy and governance efforts and remaining gaps…(More)”.

Human-AI coevolution


Paper by Dino Pedreschi et al: “Human-AI coevolution, defined as a process in which humans and AI algorithms continuously influence each other, increasingly characterises our society, but is understudied in artificial intelligence and complexity science literature. Recommender systems and assistants play a prominent role in human-AI coevolution, as they permeate many facets of daily life and influence human choices through online platforms. The interaction between users and AI results in a potentially endless feedback loop, wherein users’ choices generate data to train AI models, which, in turn, shape subsequent user preferences. This human-AI feedback loop has peculiar characteristics compared to traditional human-machine interaction and gives rise to complex and often “unintended” systemic outcomes. This paper introduces human-AI coevolution as the cornerstone for a new field of study at the intersection between AI and complexity science focused on the theoretical, empirical, and mathematical investigation of the human-AI feedback loop. In doing so, we: (i) outline the pros and cons of existing methodologies and highlight shortcomings and potential ways for capturing feedback loop mechanisms; (ii) propose a reflection at the intersection between complexity science, AI and society; (iii) provide real-world examples for different human-AI ecosystems; and (iv) illustrate challenges to the creation of such a field of study, conceptualising them at increasing levels of abstraction, i.e., scientific, legal and socio-political…(More)”.

What is ‘sovereign AI’ and why is the concept so appealing (and fraught)?


Article by John Letzing: “Denmark unveiled its own artificial intelligence supercomputer last month, funded by the proceeds of wildly popular Danish weight-loss drugs like Ozempic. It’s now one of several sovereign AI initiatives underway, which one CEO believes can “codify” a country’s culture, history, and collective intelligence – and become “the bedrock of modern economies.”

That particular CEO, Jensen Huang, happens to run a company selling the sort of chips needed to pursue sovereign AI – that is, to construct a domestic vintage of the technology, informed by troves of homegrown data and powered by the computing infrastructure necessary to turn that data into a strategic reserve of intellect…

It’s not surprising that countries are forging expansive plans to put their own stamp on AI. But big-ticket supercomputers and other costly resources aren’t feasible everywhere.

Training a large language model has gotten a lot more expensive lately; the funds required for the necessary hardware, energy, and staff may soon top $1 billion. Meanwhile, geopolitical friction over access to the advanced chips necessary for powerful AI systems could further warp the global playing field.

Even for countries with abundant resources and access, there are “sovereignty traps” to consider. Governments pushing ahead on sovereign AI could risk undermining global cooperation meant to ensure the technology is put to use in transparent and equitable ways. That might make it a lot less safe for everyone.

An example: a place using AI systems trained on a local set of values for its security may readily flag behaviour out of sync with those values as a threat…(More)”.