Making the Global Digital Compact a reality: Four steps to establish a responsible, inclusive and equitable data future.


Article by Stefaan Verhulst: “In September of this year, as world leaders assemble in New York for the 78th annual meeting of the United Nations (UN) General Assembly, they will confront a weighty agenda. War and peace will be at the forefront of conversations, along with efforts to tackle climate change and the ongoing migration crisis. Alongside these usual topics, however, the gathered dignitaries will also turn their attention to digital governance.

In 2021, the UN Secretary General proposed that a Global Digital Compact (GDC) be agreed upon that would “outline shared principles for an open, free and secure digital future for all”. The development of this Compact, which builds on a range of adjacent work streams at the UN, including activities related to the Sustainable Development Goals (SDGs), has now reached a vital inflection point. After a wide-ranging process of consultation, the General Assembly is expected to ratify the latest draft of the Digital Compact, which contains five key objectives and a commitment to thirteen cross-cutting principles. We have reached a rare moment of near-consensus in the global digital ecosystem, one that offers undeniable potential for revamping (and improving) our frameworks for global governance.

The Global Digital Compact will be agreed upon by UN Member States at the Summit of the Future at the United Nations Headquarters in New York, establishing guidelines for the responsible use and governance of digital technologies. 

The growing prominence of these objectives and principles at the seat of global governance is a welcome development. Each is essential to developing a healthy, safe and responsible digital ecosystem. In particular, the emphasis on better data governance is a step forward, as is the related call for an enhanced approach for international AI governance. Both cannot be separated: data governance is the bedrock of AI governance.

Yet now that we are moving toward ratification of the Compact, we must focus on the next crucial—and in some ways most difficult – step: implementation. This is particularly important given that the digital realm faces in many ways a growing crisis of credibility, marked by growing concerns over exclusion, extraction, concentrations of power, mis- and disinformation, and what we have elsewhere referred to as an impending “data winter”.

Manifesting the goals of the Compact to create genuine and lasting impact is thus critical. In what follows, we explore four key ways in which the Compact’s key objectives can be operationalized to create a more vibrant, responsive and free global digital commons…(More)”.

We finally have a definition for open-source AI


Article by Rhiannon Williams and James O’Donnell: “Open-source AI is everywhere right now. The problem is, no one agrees on what it actually is. Now we may finally have an answer. The Open Source Initiative (OSI), the self-appointed arbiters of what it means to be open source, has released a new definition, which it hopes will help lawmakers develop regulations to protect consumers from AI risks. 

Though OSI has published much about what constitutes open-source technology in other fields, this marks its first attempt to define the term for AI models. It asked a 70-person group of researchers, lawyers, policymakers, and activists, as well as representatives from big tech companies like Meta, Google, and Amazon, to come up with the working definition. 

According to the group, an open-source AI system can be used for any purpose without the need to secure permission, and researchers should be able to inspect its components and study how the system works.

It should also be possible to modify the system for any purpose—including to change its output—and to share it with others to usewith or without modificationsfor any purpose. In addition, the standard attempts to define a level of transparency for a given model’s training data, source code, and weights. 

The previous lack of an open-source standard presented a problem…(More)”.

It’s time we put agency into Behavioural Public Policy


Article by Sanchayan Banerjee et al: “Promoting agency – people’s ability to form intentions and to act on them freely – must become a primary objective for Behavioural Public Policy (BPP). Contemporary BPPs do not directly pursue this objective, which is problematic for many reasons. From an ethical perspective, goals like personal autonomy and individual freedom cannot be realised without nurturing citizens’ agency. From an efficacy standpoint, BPPs that override agency – for example, by activating automatic psychological processes – leave citizens ‘in the dark’, incapable of internalising and owning the process of behaviour change. This may contribute to non-persistent treatment effects, compensatory negative spillovers or psychological reactance and backfiring effects. In this paper, we argue agency-enhancing BPPs can alleviate these ethical and efficacy limitations to longer-lasting and meaningful behaviour change. We set out philosophical arguments to help us understand and conceptualise agency. Then, we review three alternative agency-enhancing behavioural frameworks: (1) boosts to enhance people’s competences to make better decisions; (2) debiasing to encourage people to reduce the tendency for automatic, impulsive responses; and (3) nudge+ to enable citizens to think alongside nudges and evaluate them transparently. Using a multi-dimensional framework, we highlight differences in their workings, which offer comparative insights and complementarities in their use. We discuss limitations of agency-enhancing BPPs and map out future research directions…(More)”.

The Complexities of Differential Privacy for Survey Data


Paper by Jörg Drechsler & James Bailie: “The concept of differential privacy (DP) has gained substantial attention in recent years, most notably since the U.S. Census Bureau announced the adoption of the concept for its 2020 Decennial Census. However, despite its attractive theoretical properties, implementing DP in practice remains challenging, especially when it comes to survey data. In this paper we present some results from an ongoing project funded by the U.S. Census Bureau that is exploring the possibilities and limitations of DP for survey data. Specifically, we identify five aspects that need to be considered when adopting DP in the survey context: the multi-staged nature of data production; the limited privacy amplification from complex sampling designs; the implications of survey-weighted estimates; the weighting adjustments for nonresponse and other data deficiencies, and the imputation of missing values. We summarize the project’s key findings with respect to each of these aspects and also discuss some of the challenges that still need to be addressed before DP could become the new data protection standard at statistical agencies…(More)”.

Artificial Intelligence for the Internal Democracy of Political Parties


Paper by Claudio Novelli et al: “The article argues that AI can enhance the measurement and implementation of democratic processes within political parties, known as Intra-Party Democracy (IPD). It identifies the limitations of traditional methods for measuring IPD, which often rely on formal parameters, self-reported data, and tools like surveys. Such limitations lead to partial data collection, rare updates, and significant resource demands. To address these issues, the article suggests that specific data management and Machine Learning techniques, such as natural language processing and sentiment analysis, can improve the measurement and practice of IPD…(More)”.

Align or fail: How economics shape successful data sharing


Blog by Federico Bartolomucci: “…The conceptual distinctions between different data sharing models are mostly based on one fundamental element: the economic nature of data and its value. 

Open data projects operate under the assumption that data is a non-rival (i.e. can be used by multiple people at the same time) and a non-excludable asset (i.e. anyone can use it, similar to a public good like roads or the air we breathe). This means that data can be shared with everyone, for any use, without losing its market and competitive value. The Humanitarian Data Exchange platform is a great example that allows organizations to share over 19,000 open data sets on all aspects of humanitarian response with others.

Data collaboratives treat data as an excludable asset that some people may be excluded from accessing (i.e. a ‘club good’, like a movie theater) and therefore share it only among a restricted pool of actors. At the same time, they overcome the rival nature of this data set up by linking its use to a specific purpose. These work best by giving the actors a voice in choosing the purpose for which the data will be used, and through specific agreements and governance bodies that ensure that those contributing data will not have their competitive position harmed, therefore incentivizing them to engage. A good example of this is the California Data Collaborative, which uses data from different actors in the water sector to develop high-level analysis on water distribution to guide policy, planning, and operations for water districts in the state of California. 

Data ecosystems work by activating market mechanisms around data exchange to overcome reluctance to share data, rather than relying solely on its purpose of use. This means that actors can choose to share their data in exchange for compensation, be it monetary or in alternate forms such as other data. In this way, the compensation balances the potential loss of competitive advantage created by the sharing of a rival asset, as well as the costs and risks of sharing. The Enershare initiative aims to establish a marketplace utilizing blockchain and smart contracts to facilitate data exchange in the energy sector. The platform is based on a compensation system, which can be non-monetary, for exchanging assets and resources related to data (such as datasets, algorithms, and models) with energy assets and services (like heating system maintenance or the transfer of surplus locally self-produced energy).

These different models of data sharing have different operational implications…(More)”.

On Fables and Nuanced Charts


Column by Spencer Greenberg and Amber Dawn Ace: “In 1994, the U.S. Congress passed the largest crime bill in U.S. history, called the Violent Crime Control and Law Enforcement Act. The bill allocated billions of dollars to build more prisons and hire 100,000 new police officers, among other things. In the years following the bill’s passage, violent crime rates in the U.S. dropped drastically, from around 750 offenses per 100,000 people in 1990 to under 400 in 2018.

A chart showing U.S. crime rates over time. The data and annotation are real, but the implied story is not. Credit: Authors.

But can we infer, as this chart seems to ask us to, that the bill caused the drop in crime?

As it turns out, this chart wasn’t put together by sociologists or political scientists who’ve studied violent crime. Rather, we—a mathematician and a writer—devised it to make a point: Although charts seem to reflect reality, they often convey narratives that are misleading or entirely false.

Upon seeing that violent crime dipped after 1990, we looked up major events that happened right around that time—selecting one, the 1994 Crime Bill, and slapping it on the graph. There are other events we could have stuck on the graph just as easily that would likely have invited you to construct a completely different causal story. In other words, the bill and the data in the graph are real, but the story is manufactured.

Perhaps the 1994 Crime Bill really did cause the drop in violent crime, or perhaps the causality goes the other way: the spike in violent crime motivated politicians to pass the act in the first place. (Note that the act was passed slightly after the violent crime rate peaked!) 

Charts are a concise way not only to show data but also to tell a story. Such stories, however, reflect the interpretations of a chart’s creators and are often accepted by the viewer without skepticism. As Noah Smith and many others have argued, charts contain hidden assumptions that can drastically change the story they tell…(More)”.

Toward a citizen science framework for public policy evaluation


Paper by Giovanni Esposito et al: “This study pioneers the use of citizen science in evaluating Freedom of Information laws, with a focus on Belgium, where since its 1994 enactment, Freedom of Information’s effectiveness has remained largely unexamined. Utilizing participatory methods, it engages citizens in assessing transparency policies, significantly contributing to public policy evaluation methodology. The research identifies regional differences in Freedom of Information implementation across Belgian municipalities, highlighting that larger municipalities handle requests more effectively, while administrations generally show reluctance to respond to requests from perceived knowledgeable individuals. This phenomenon reflects a broader European caution toward well-informed requesters. By integrating citizen science, this study not only advances our understanding of Freedom of Information law effectiveness in Belgium but also advocates for a more inclusive, collaborative approach to policy evaluation. It addresses the gap in researchers’ experience with citizen science, showcasing its vast potential to enhance participatory governance and policy evaluation…(More)”.

Revisiting the ‘Research Parasite’ Debate in the Age of AI


Article by C. Brandon Ogbunu: “A 2016 editorial published in the New England Journal of Medicine lamented the existence of “research parasites,” those who pick over the data of others rather than generating new data themselves. The article touched on the ethics and appropriateness of this practice. The most charitable interpretation of the argument centered around the hard work and effort that goes into the generation of new data, which costs millions of research dollars and takes countless person-hours. Whatever the merits of that argument, the editorial and its associated arguments were widely criticized.

Given recent advances in AI, revisiting the research parasite debate offers a new perspective on the ethics of sharing and data democracy. It is ironic that the critics of research parasites might have made a sound argument — but for the wrong setting, aimed at the wrong target, at the wrong time. Specifically, the large language models, or LLMs, that underlie generative AI tools such as OpenAI’s ChatGPT, have an ethical challenge in how they parasitize freely available data. These discussions bring up new conversations about data security that may undermine, or at least complicate, efforts at openness and data democratization.

The backlash to that 2016 editorial was swift and violent. Many arguments centered around the anti-science spirit of the message. For example, metanalysis – which re-analyzes data from a selection of studies – is a critical practice that should be encouraged. Many groundbreaking discoveries about the natural world and human health have come from this practice, including new pictures of the molecular causes of depression and schizophrenia. Further, the central criticisms of research parasitism undermine the ethical goals of data sharing and ambitions for open science, where scientists and citizen-scientists can benefit from access to data. This differs from the status quo in 2016, when data published in many of the top journals of the world were locked behind a paywall, illegible, poorly labeled, or difficult to use. This remains largely true in 2024…(More)”.

Private sector trust in data sharing: enablers in the European Union


Paper by Jaime Bernal: “Enabling private sector trust stands as a critical policy challenge for the success of the EU Data Governance Act and Data Act in promoting data sharing to address societal challenges. This paper attributes the widespread trust deficit to the unmanageable uncertainty that arises from businesses’ limited usage control to protect their interests in the face of unacceptable perceived risks. For example, a firm may hesitate to share its data with others in case it is leaked and falls into the hands of business competitors. To illustrate this impasse, competition, privacy, and reputational risks are introduced, respectively, in the context of three suboptimal approaches to data sharing: data marketplaces, data collaboratives, and data philanthropy. The paper proceeds by analyzing seven trust-enabling mechanisms comprised of technological, legal, and organizational elements to balance trust, risk, and control and assessing their capacity to operate in a fair, equitable, and transparent manner. Finally, the paper examines the regulatory context in the EU and the advantages and limitations of voluntary and mandatory data sharing, concluding that an approach that effectively balances the two should be pursued…(More)”.