The Emergent Landscape of Data Commons: A Brief Survey and Comparison of Existing Initiatives


Article by Stefaan G. Verhulst and Hannah Chafetz: With the increased attention on the need for data to advance AI, data commons initiatives around the world are redefining how data can be accessed, and re-used for societal benefit. These initiatives focus on generating access to data from various sources for a public purpose and are governed by communities themselves. While diverse in focus–from health and mobility to language and environmental data–data commons are united by a common goal: democratizing access to data to fuel innovation and tackle global challenges.

This includes innovation in the context of artificial intelligence (AI). Data commons are providing the framework to make pools of diverse data available in machine understandable formats for responsible AI development and deployment. By providing access to high quality data sources with open licensing, data commons can help increase the quantity of training data in a less exploitative fashion, minimize AI providers’ reliance on data extracted across the internet without an open license, and increase the quality of the AI output (while reducing mis-information).

Over the last few months, the Open Data Policy Lab (a collaboration between The GovLab and Microsoft) has conducted various research initiatives to explore these topics further and understand:

(1) how the concept of a data commons is changing in the context of artificial intelligence, and

(2) current efforts to advance the next generation of data commons.

In what follows we provide a summary of our findings thus far. We hope it inspires more data commons use cases for responsible AI innovation in the public’s interest…(More)”.

Two Open Science Foundations: Data Commons and Stewardship as Pillars for Advancing the FAIR Principles and Tackling Planetary Challenges


Article by Stefaan Verhulst and Jean Claude Burgelman: “Today the world is facing three major planetary challenges: war and peace, steering Artificial Intelligence and making the planet a healthy Anthropoceen. As they are closely interrelated, they represent an era of “polycrisis”, to use the term Adam Tooze has coined. There are no simple solutions or quick fixes to these (and other) challenges; their interdependencies demand a multi-stakeholder, interdisciplinary approach.

As world leaders and experts convene in Baku for The 29th session of the Conference of the Parties to the United Nations Framework Convention on Climate Change (COP29), the urgency of addressing these global crises has never been clearer. A crucial part of addressing these challenges lies in advancing science — particularly open science, underpinned by data made available leveraging the FAIR principles (Findable, Accessible, Interoperable, and Reusable). In this era of computation, the transformative potential of research depends on the seamless flow and reuse of high-quality data to unlock breakthrough insights and solutions. Ensuring data is available in reusable, interoperable formats not only accelerates the pace of scientific discovery but also expedites the search for solutions to global crises.

Image of the retreat of the Columbia glacier by Jesse Allen, using Landsat data from the U.S. Geological Survey. Free to re-use from NASA Visible Earth.

While FAIR principles provide a vital foundation for making data accessible, interoperable and reusable, translating these principles into practice requires robust institutional approaches. Toward that end, in the below, we argue two foundational pillars must be strengthened:

  • Establishing Data Commons: The need for shared data ecosystems where resources can be pooled, accessed, and re-used collectively, breaking down silos and fostering cross-disciplinary collaboration.
  • Enabling Data Stewardship: Systematic and responsible data reuse requires more than access; it demands stewardship — equipping institutions and scientists with the capabilities to maximize the value of data while safeguarding its responsible use is essential…(More)”.

A Second Academic Exodus From X?


Article by Josh Moody: “Two years ago, after Elon Musk bought Twitter for $44 billion, promptly renaming it X, numerous academics decamped from the platform. Now, in the wake of a presidential election fraught with online disinformation, a second exodus from the social media site appears underway.

Academics, including some with hundreds of thousands of followers, announced departures from the platform in the immediate aftermath of the election, decrying the toxicity of the website and objections to Musk and how he wielded the platform to back President-elect Donald Trump. The business mogul threw millions of dollars behind Trump and personally campaigned for him this fall. Musk also personally advanced various debunked conspiracy theories during the election cycle.

Amid another wave of exits, some users see this as the end of Academic Twitter, which was already arguably in its death throes…

LeBlanc, Kamola and Rosen all mentioned that they were moving to the platform Bluesky, which has grown to 14.5 million users, welcoming more than 700,000 new accounts in recent days. In September, Bluesky had nine million users…

A study published in PS: Political Science & Politics last month concluded that academics began to engage less after Musk bought the platform. But the peak of disengagement wasn’t when the billionaire took over the site in October 2022 but rather the next month, when he reinstated Donald Trump’s account, which the platform’s previous owners deactivated following the Jan. 6, 2021, insurrection, which he encouraged.

The researchers reviewed 15,700 accounts from academics in economics, political science, sociology and psychology for their study.

James Bisbee, a political science professor at Vanderbilt University and article co-author, wrote via email that changes to the platform, particularly to the application programming interface, or API, undermined their ability to collect data for their research.

“Twitter used to be an amazing source of data for political scientists (and social scientists more broadly) thanks in part to its open data ethos,” Bisbee wrote. “Since Musk’s takeover, this is no longer the case, severely limiting the types of conclusions we could draw, and theories we could test, on this platform.”

To Bisbee, that loss is an understated issue: “Along with many other troubling developments on X since the change in ownership, the amputation of data access should not be ignored.”..(More)”

The Death of Search


Article by Matteo Wong: “For nearly two years, the world’s biggest tech companies have said that AI will transform the web, your life, and the world. But first, they are remaking the humble search engine.

Chatbots and search, in theory, are a perfect match. A standard Google search interprets a query and pulls up relevant results; tech companies have spent tens or hundreds of millions of dollars engineering chatbots that interpret human inputs, synthesize information, and provide fluent, useful responses. No more keyword refining or scouring Wikipedia—ChatGPT will do it all. Search is an appealing target, too: Shaping how people navigate the internet is tantamount to shaping the internet itself.

Months of prophesying about generative AI have now culminated, almost all at once, in what may be the clearest glimpse yet into the internet’s future. After a series of limited releases and product demos, mired with various setbacks and embarrassing errors, tech companies are debuting AI-powered search engines as fully realized, all-inclusive products. Last Monday, Google announced that it would launch its AI Overviews in more than 100 new countries; that feature will now reach more than 1 billion users a month. Days later, OpenAI announced a new search function in ChatGPT, available to paid users for now and soon opening to the public. The same afternoon, the AI-search start-up Perplexity shared instructions for making its “answer engine” the default search tool in your web browser.

For the past week, I have been using these products in a variety of ways: to research articles, follow the election, and run everyday search queries. In turn I have scried, as best I can, into the future of how billions of people will access, relate to, and synthesize information. What I’ve learned is that these products are at once unexpectedly convenient, frustrating, and weird. These tools’ current iterations surprised and, at times, impressed me, yet even when they work perfectly, I’m not convinced that AI search is a wise endeavor…(More)”.

Congress should designate an entity to oversee data security, GAO says


Article by Matt Bracken: “Federal agencies may need to rethink how they handle individuals’ personal data to protect their civil rights and civil liberties, a congressional watchdog said in a new report Tuesday.

Without federal guidance governing the protection of the public’s civil rights and liberties, agencies have pursued a patchwork system of policies tied to the collection, sharing and use of data, the Government Accountability Office said

To address that problem head-on, the GAO is recommending that Congress select “an appropriate federal entity” to produce guidance or regulations regarding data protection that would apply to all agencies, giving that entity “the explicit authority to make needed technical and policy choices or explicitly stating Congress’s own choices.”

That recommendation was formed after the GAO sent a questionnaire to all 24 Chief Financial Officers Act agencies asking for information about their use of emerging technologies and data capabilities and how they’re guaranteeing that personally identifiable information is safeguarded.

The GAO found that 16 of those CFO Act agencies have policies or procedures in place to protect civil rights and civil liberties with regard to data use, while the other eight have not taken steps to do the same.

The most commonly cited issues for agencies in their efforts to protect the civil rights and civil liberties of the public were “complexities in handling protections associated with new and emerging technologies” and “a lack of qualified staff possessing needed skills in civil rights, civil liberties, and emerging technologies.”

“Further, eight of the 24 agencies believed that additional government-wide law or guidance would strengthen consistency in addressing civil rights and civil liberties protections,” the GAO wrote. “One agency noted that such guidance could eliminate the hodge-podge approach to the governance of data and technology.”

All 24 CFO Act agencies have internal offices to “handle the protection of the public’s civil rights as identified in federal laws,” with much of that work centered on the handling of civil rights violations and related complaints. Four agencies — the departments of Defense, Homeland Security, Justice and Education — have offices to specifically manage civil liberty protections across their entire agencies. The other 20 agencies have mostly adopted a “decentralized approach to protecting civil liberties, including when collecting, sharing, and using data,” the GAO noted…(More)”.

Who Is Responsible for AI Copyright Infringement?


Article by Michael P. Goodyear: “Twenty-one-year-old college student Shane hopes to write a song for his boyfriend. In the past, Shane would have had to wait for inspiration to strike, but now he can use generative artificial intelligence to get a head start. Shane decides to use Anthropic’s AI chat system, Claude, to write the lyrics. Claude dutifully complies and creates the words to a love song. Shane, happy with the result, adds notes, rhythm, tempo, and dynamics. He sings the song and his boyfriend loves it. Shane even decides to post a recording to YouTube, where it garners 100,000 views.

But Shane did not realize that this song’s lyrics are similar to those of “Love Story,” Taylor Swift’s hit 2008 song. Shane must now contend with copyright law, which protects original creative expression such as music. Copyright grants the rights owner the exclusive rights to reproduce, perform, and create derivatives of the copyrighted work, among other things. If others take such actions without permission, they can be liable for damages up to $150,000. So Shane could be on the hook for tens of thousands of dollars for copying Swift’s song.

Copyright law has surged into the news in the past few years as one of the most important legal challenges for generative AI tools like Claude—not for the output of these tools but for how they are trained. Over two dozen pending court cases grapple with the question of whether training generative AI systems on copyrighted works without compensating or getting permission from the creators is lawful or not. Answers to this question will shape a burgeoning AI industry that is predicted to be worth $1.3 trillion by 2032.

Yet there is another important question that few have asked: Who should be liable when a generative AI system creates a copyright-infringing output? Should the user be on the hook?…(More)”

People-centred and participatory policymaking


Blog by the UK Policy Lab: “…Different policies can play out in radically different ways depending on circumstance and place. Accordingly it is important for policy professionals to have access to a diverse suite of people-centred methods, from gentle and compassionate techniques that increase understanding with small groups of people to higher-profile, larger-scale engagements. The image below shows a spectrum of people-centred and participatory methods that can be used in policy, ranging from light-touch involvement (e.g. consultation), to structured deliberation (e.g. citizens’ assemblies) and deeper collaboration and empowerment (e.g. participatory budgeting). This spectrum of participation is speculatively mapped against stages of the policy cycle…(More)”.

Social Innovation and the Journey to Transformation


Special series by Skoll for the Stanford Social Innovation Review: “…we explore system orchestration, collaborative funding, government partnerships, mission-aligned investing, reimagined storytelling, and evaluation and learning. These seven articles highlight successful approaches to collective action and share compelling examples of social transformation.

The time is now for philanthropy to align the speed and scale of our investments with the scope of the global challenges that social innovators seek to address. We hope this series will spark fresh thinking and new ideas for how we can create durable systemic change quickly and together…(More)”.

From Digital Sovereignty to Digital Agency


Article by Akash Kapur: “In recent years, governments have increasingly pursued variants of digital sovereignty to regulate and control the global digital ecosystem. The pursuit of AI sovereignty represents the latest iteration in this quest. 

Digital sovereignty may offer certain benefits, but it also poses undeniable risks, including the possibility of undermining the very goals of autonomy and self-reliance that nations are seeking. These risks are particularly pronounced for smaller nations with less capacity, which might do better in a revamped, more inclusive, multistakeholder system of digital governance. 

Organizing digital governance around agency rather than sovereignty offers the possibility of such a system. Rather than reinforce the primacy of nations, digital agency asserts the rights, priorities, and needs not only of sovereign governments but also of the constituent parts—the communities and individuals—they purport to represent.

Three cross-cutting principles underlie the concept of digital agency: recognizing stakeholder multiplicity, enhancing the latent possibilities of technology, and promoting collaboration. These principles lead to three action-areas that offer a guide for digital policymakers: reinventing institutions, enabling edge technologies, and building human capacity to ensure technical capacity…(More)”.

OECD Digital Economy Outlook 2024


OECD Report: “The most recent phase of digital transformation is marked by rapid technological changes, creating both opportunities and risks for the economy and society. The Volume 2 of the OECD Digital Economy Outlook 2024 explores emerging priorities, policies and governance practices across countries. It also examines trends in the foundations that enable digital transformation, drive digital innovation and foster trust in the digital age. The volume concludes with a statistical annex…

In 2023, digital government, connectivity and skills topped the list of digital policy priorities. Increasingly developed at a high level of government, national digital strategies play a critical role in co-ordinating these efforts. Nearly half of the 38 countries surveyed develop these strategies through dedicated digital ministries, up from just under a quarter in 2016. Among 1 200 policy initiatives tracked across the OECD, one-third aim to boost digital technology adoption, social prosperity, and innovation. AI and 5G are the most often-cited technologies…(More)”