The Emergent Landscape of Data Commons: A Brief Survey and Comparison of Existing Initiatives


Article by Stefaan G. Verhulst and Hannah Chafetz: With the increased attention on the need for data to advance AI, data commons initiatives around the world are redefining how data can be accessed, and re-used for societal benefit. These initiatives focus on generating access to data from various sources for a public purpose and are governed by communities themselves. While diverse in focus–from health and mobility to language and environmental data–data commons are united by a common goal: democratizing access to data to fuel innovation and tackle global challenges.

This includes innovation in the context of artificial intelligence (AI). Data commons are providing the framework to make pools of diverse data available in machine understandable formats for responsible AI development and deployment. By providing access to high quality data sources with open licensing, data commons can help increase the quantity of training data in a less exploitative fashion, minimize AI providers’ reliance on data extracted across the internet without an open license, and increase the quality of the AI output (while reducing mis-information).

Over the last few months, the Open Data Policy Lab (a collaboration between The GovLab and Microsoft) has conducted various research initiatives to explore these topics further and understand:

(1) how the concept of a data commons is changing in the context of artificial intelligence, and

(2) current efforts to advance the next generation of data commons.

In what follows we provide a summary of our findings thus far. We hope it inspires more data commons use cases for responsible AI innovation in the public’s interest…(More)”.

Two Open Science Foundations: Data Commons and Stewardship as Pillars for Advancing the FAIR Principles and Tackling Planetary Challenges


Article by Stefaan Verhulst and Jean Claude Burgelman: “Today the world is facing three major planetary challenges: war and peace, steering Artificial Intelligence and making the planet a healthy Anthropoceen. As they are closely interrelated, they represent an era of “polycrisis”, to use the term Adam Tooze has coined. There are no simple solutions or quick fixes to these (and other) challenges; their interdependencies demand a multi-stakeholder, interdisciplinary approach.

As world leaders and experts convene in Baku for The 29th session of the Conference of the Parties to the United Nations Framework Convention on Climate Change (COP29), the urgency of addressing these global crises has never been clearer. A crucial part of addressing these challenges lies in advancing science — particularly open science, underpinned by data made available leveraging the FAIR principles (Findable, Accessible, Interoperable, and Reusable). In this era of computation, the transformative potential of research depends on the seamless flow and reuse of high-quality data to unlock breakthrough insights and solutions. Ensuring data is available in reusable, interoperable formats not only accelerates the pace of scientific discovery but also expedites the search for solutions to global crises.

Image of the retreat of the Columbia glacier by Jesse Allen, using Landsat data from the U.S. Geological Survey. Free to re-use from NASA Visible Earth.

While FAIR principles provide a vital foundation for making data accessible, interoperable and reusable, translating these principles into practice requires robust institutional approaches. Toward that end, in the below, we argue two foundational pillars must be strengthened:

  • Establishing Data Commons: The need for shared data ecosystems where resources can be pooled, accessed, and re-used collectively, breaking down silos and fostering cross-disciplinary collaboration.
  • Enabling Data Stewardship: Systematic and responsible data reuse requires more than access; it demands stewardship — equipping institutions and scientists with the capabilities to maximize the value of data while safeguarding its responsible use is essential…(More)”.

People-centred and participatory policymaking


Blog by the UK Policy Lab: “…Different policies can play out in radically different ways depending on circumstance and place. Accordingly it is important for policy professionals to have access to a diverse suite of people-centred methods, from gentle and compassionate techniques that increase understanding with small groups of people to higher-profile, larger-scale engagements. The image below shows a spectrum of people-centred and participatory methods that can be used in policy, ranging from light-touch involvement (e.g. consultation), to structured deliberation (e.g. citizens’ assemblies) and deeper collaboration and empowerment (e.g. participatory budgeting). This spectrum of participation is speculatively mapped against stages of the policy cycle…(More)”.

How to evaluate statistical claims


Blog by Sean Trott: “…The goal of this post is to distill what I take to be the most important, immediately applicable, and generalizable insights from these classes. That means that readers should be able to apply those insights without a background in math or knowing how to, say, build a linear model in R. In that way, it’ll be similar to my previous post about “useful cognitive lenses to see through”, but with a greater focus on evaluating claims specifically.

Lesson #1: Consider the whole distribution, not just the central tendency.

If you spend much time reading news articles or social media posts, the odds are good you’ll encounter some descriptive statistics: numbers summarizing or describing a distribution (a set of numbers or values in a dataset). One of the most commonly used descriptive statistics is the arithmetic mean: the sum of every value in a distribution, divided by the number of values overall. The arithmetic mean is a measure of “central tendency”, which just means it’s a way to characterize the typical or expected value in that distribution.

The arithmetic mean is a really useful measure. But as many readers might already know, it’s not perfect. It’s strongly affected by outliers—values that are really different from the rest of the distribution—and things like the skew of a distribution (see the image below for examples of skewed distribution).

Three different distributions. Leftmost is a roughly “normal” distribution; middle is a “right-skewed” distribution; and rightmost is a “left-skewed” distribution.

In particular, the mean is pulled in the direction of outliers or distribution skew. That’s the logic behind the joke about the average salary of people at a bar jumping up as soon as a billionaire walks in. It’s also why other measures of central tendency, such as the median, are often presented alongside (or instead of) the mean—especially for distributions that happen to be very skewed, such as income or wealth.

It’s not that one of these measures is more “correct”. As Stephen Jay Gould wrote in his article The Median Is Not the Message, they’re just different perspectives on the same distribution:

A politician in power might say with pride, “The mean income of our citizens is $15,000 per year.” The leader of the opposition might retort, “But half our citizens make less than $10,000 per year.” Both are right, but neither cites a statistic with impassive objectivity. The first invokes a mean, the second a median. (Means are higher than medians in such cases because one millionaire may outweigh hundreds of poor people in setting a mean, but can balance only one mendicant in calculating a median.)..(More)”

Access, Signal, Action: Data Stewardship Lessons from Valencia’s Floods


Article by Marta Poblet, Stefaan Verhulst, and Anna Colom: “Valencia has a rich history in water management, a legacy shaped by both triumphs and tragedies. This connection to water is embedded in the city’s identity, yet modern floods test its resilience in new ways.

During the recent floods, Valencians experienced a troubling paradox. In today’s connected world, digital information flows through traditional and social media, weather apps, and government alert systems designed to warn us of danger and guide rapid responses. Despite this abundance of data, a tragedy unfolded last month in Valencia. This raises a crucial question: how can we ensure access to the right data, filter it for critical signals, and transform those signals into timely, effective action?

Data stewardship becomes essential in this process.

In particular, the devastating floods in Valencia underscore the importance of:

  • having access to data to strengthen the signal (first mile challenges)
  • separating signal from noise
  • translating signal into action (last mile challenges)…(More)”.

Mini-publics and the public: challenges and opportunities


Conversation between Sarah Castell and Stephen Elstub: “…there’s a real problem here: the public are only going to get to know about a mini-public if it gets media coverage, but the media will only cover it if it makes an impact. But it’s more likely to make an impact if the public are aware of it. That’s a tension that mini-publics need to overcome, because it’s important that they reach out to the public. Ultimately it doesn’t matter how inclusive the recruitment is and how well it’s done. It doesn’t matter how well designed the process is. It is still a small number of people involved, so we want mini-publics to be able to influence public opinion and stimulate public debate. And if they can do that, then it’s more likely to affect elite opinion and debate as well, and possibly policy.

One more thing is that, people in power aren’t in the habit of sharing power. And that’s why it’s very difficult. I think the politicians are mainly motivated around this because they hope it’s going to look good to the electorate and get them some votes, but they are also worried about low levels of trust in society and what the ramifications of that might be. But in general, people in power don’t give it away very easily…

Part of the problem is that a lot of the research around public views on deliberative processes was done through experiments. It is useful, but it doesn’t quite tell us what will happen when mini-publics are communicated to the public in the messy real public sphere. Previously, there just weren’t that many well-known cases that we could actually do field research on. But that is starting to change.

There’s also more interdisciplinary work needed in this area. We need to improve how communication strategies around citizens’ assembly are done – there must be work that’s relevant in political communication studies and other fields who have this kind of insight…(More)”.

Trust in artificial intelligence makes Trump/Vance a transhumanist ticket


Article by Filip Bialy: “AI plays a central role in the 2024 US presidential election, as a tool for disinformation and as a key policy issue. But its significance extends beyond these, connecting to an emerging ideology known as TESCREAL, which envisages AI as a catalyst for unprecedented progress, including space colonisation. After this election, TESCREALism may well have more than one representative in the White House, writes Filip Bialy

In June 2024, the essay Situational Awareness by former OpenAI employee Leopold Aschenbrenner sparked intense debate in the AI community. The author predicted that by 2027, AI would surpass human intelligence. Such claims are common among AI researchers. They often assert that only a small elite – mainly those working at companies like OpenAI – possesses inside knowledge of the technology. Many in this group hold a quasi-religious belief in the imminent arrival of artificial general intelligence (AGI) or artificial superintelligence (ASI)…

These hopes and fears, however, are not only religious-like but also ideological. A decade ago, Silicon Valley leaders were still associated with the so-called Californian ideology, a blend of hippie counterculture and entrepreneurial yuppie values. Today, figures like Elon Musk, Mark Zuckerberg, and Sam Altman are under the influence of a new ideological cocktail: TESCREAL. Coined in 2023 by Timnit Gebru and Émile P. Torres, TESCREAL stands for Transhumanism, Extropianism, Singularitarianism, Cosmism, Rationalism, Effective Altruism, and Longtermism.

While these may sound like obscure terms, they represent ideas developed over decades, with roots in eugenics. Early 20th-century eugenicists such as Francis Galton promoted selective breeding to enhance future generations. Later, with advances in genetic engineering, the focus shifted from eugenics’ racist origins to its potential to eliminate genetic defects. TESCREAL represents a third wave of eugenics. It aims to digitise human consciousness and then propagate digital humans into the universe…(More)”

Open-Access AI: Lessons From Open-Source Software


Article by Parth NobelAlan Z. RozenshteinChinmayi Sharma: “Before analyzing how the lessons of open-source software might (or might not) apply to open-access AI, we need to define our terms and explain why we use the term “open-access AI” to describe models like Llama rather than the more commonly used “open-source AI.” We join many others in arguing that “open-source AI” is a misnomer for such models. It’s misleading to fully import the definitional elements and assumptions that apply to open-source software when talking about AI. Rhetoric matters, and the distinction isn’t just semantic; it’s about acknowledging the meaningful differences in access, control, and development. 

The software industry definition of “open source” grew out of the free software movement, which makes the point that “users have the freedom to run, copy, distribute, study, change and improve” software. As the movement emphasizes, one should “think of ‘free’ as in ‘free speech,’ not as in ‘free beer.’” What’s “free” about open-source software is that users can do what they want with it, not that they initially get it for free (though much open-source software is indeed distributed free of charge). This concept is codified by the Open Source Initiative as the Open Source Definition (OSD), many aspects of which directly apply to Llama 3.2. Llama 3.2’s license makes it freely redistributable by license holders (Clause 1 of the OSD) and allows the distribution of the original models, their parts, and derived works (Clauses 3, 7, and 8). ..(More)”.

Make it make sense: the challenge of data analysis in global deliberation


Blog by Iñaki Goñi: “From climate change to emerging technologies to economic justice to space, global and transnational deliberation is on the rise. Global deliberative processes aim to bring citizen-centred governance to issues that no single nation can resolve alone. Running deliberative processes at this scale poses a unique set of challenges. How to select participants, make the forums accountableimpactfulfairly designed, and aware of power imbalances, are all crucial and open questions….

Massifying participation will be key to invigorating global deliberation. Assemblies will have a better chance of being seen as legitimate, fair, and publicly supported if they involve thousands or even millions of diverse participants. This raises an operational challenge: how to systematise political ideas from many people across the globe.

In a centralised global assembly, anything from 50 to 500 citizens from various countries engage in a single deliberation and produce recommendations or political actions by crossing languages and cultures. In a distributed assembly, multiple gatherings are convened locally that share a common but flexible methodology, allowing participants to discuss a common issue applied both to local and global contexts. Either way, a global deliberation process demands the organisation and synthesis of possibly thousands of ideas from diverse languages and cultures around the world.

How could we ever make sense of all that data to systematise citizens’ ideas and recommendations? Most people turn their heads to computational methods to help reduce complexity and identify patterns. First up, one technique for analysing text amounts to little more than simple counting, through which we can produce something like a frequency table or a wordcloud…(More)”.

A shared destiny for public sector data


Blog post by Shona Nicol: “As a data professional, it can sometime feel hard to get others interested in data. Perhaps like many in this profession, I can often express the importance and value of data for good in an overly technical way. However when our biggest challenges in Scotland include eradicating child poverty, growing the economy and tackling the climate emergency, I would argue that we should all take an interest in data because it’s going to be foundational in helping us solve these problems.

Data is already intrinsic to shaping our society and how services are delivered. And public sector data is a vital component in making sure that services for the people of Scotland are being delivered efficiently and effectively. Despite an ever growing awareness of the transformative power of data to improve the design and delivery of services, feedback from public sector staff shows that they can face difficulties when trying to influence colleagues and senior leaders around the need to invest in data.

A vision gap

In the Scottish Government’s data maturity programme and more widely, we regularly hear about the challenges data professionals encounter when trying to enact change. This community tell us that a long-term vision for public sector data for Scotland could help them by providing the context for what they are trying to achieve locally.

Earlier this year we started to scope how we might do this. We recognised that organisations are already working to deliver local and national strategies and policies that relate to data, so any vision had to be able to sit alongside those, be meaningful in different settings, agnostic of technology and relevant to any public sector organisation. We wanted to offer opportunities for alignment, not enforce an instruction manual…(More)”.