Stefaan Verhulst
Paper by Tara Cookson and Ruth Carlitz: “In 2013, the United Nations called for a “Data Revolution” to advance sustainable development. “Data for Good” initiatives that have followed bring together development and humanitarian actors with technology companies. Few studies have examined the composition of Data for Good partnerships or assessed the uptake and use of the data they generate. We help fill this gap with a case study of Meta’s (then Facebook) Survey on Gender Equality at Home, which reached over half a million Facebook users in more than 200 countries. The survey was developed in partnership with international development and humanitarian organizations. Our study is uniquely informed by our involvement in this partnership: we contributed subject matter expertise to the development of the survey and advised on dissemination strategies for the resulting data, which we also analyzed in our own academic work. We complement this autoethnographic perspective with insights from scholars of partnerships for development, and a practitioner framework to understand the factors connecting data to action. We find that including multiple partners can widen the scope of a project such that it gains breadth but loses depth. In addition, while it is (somewhat) possible to quantify the impact of a Data for Good partnership in terms of data use, “goodness” can also be assessed in terms of the process of producing data. Specifically, collaborations between organizations with different interests and resources may be of significant social value, particularly when they learn from one another—even if such goodness is harder to quantify…(More)”.
Press Release: “Today, the Department of Commerce announced that it will begin posting real gross domestic product (GDP) data on the blockchain beginning with the July 2025 data…This is the first time a federal agency has published economic statistical data like this on the blockchain, and the latest way the Department is utilizing innovative technology to protect federal data and promote public use.
The Department published an official hash of its quarterly GDP data release for 2025—and, in some cases, the topline GDP number—to the following nine blockchains: Bitcoin, Ethereum, Solana, TRON, Stellar, Avalanche, Arbitrum One, Polygon PoS, and Optimism. The data was also further disseminated through coordination with the oracles, Pyth and Chainlink. The exchanges, Coinbase, Gemini, and Kraken, helped facilitate the Department’s publishing. The Department will continue to innovate and broaden the scope of publishing future datasets like GDP to include the use of other blockchains, oracles, and exchanges.
Through this landmark effort, the Department hopes to demonstrate the wide utility of blockchain technology. It also aims to demonstrate a proof of concept for all of government, and to build on the Trump Administration’s historic efforts to make the United States of America the blockchain capital of the world…(More)”.
Paper by Chijioke I Okorie and Melissa Omino: “This article examines the relationship between Standard Public Open Licences (SPOLs) and inequity in the artificial intelligence (AI) innovation ecosystem, focusing on how these licences affect access to and use of African datasets. While SPOLs are widely promoted as tools for democratising data access, they often apply uniform conditions to all users, disregarding disparities in infrastructure, capacity and socioeconomic context. As a result, SPOLs may unintentionally reinforce exclusion and enable extractive data practices that disadvantage communities contributing valuable datasets that they have preserved and curated through historically challenging conditions. The study employs a desktop literature review of primary and secondary sources, complemented by analysis of specific case studies from the Masakhane Research Collective in Natural Language Processing and qualitative vignettes based on real-world experiences to identify inherent and systemic limitations of current SPOLs. The research shows how existing SPOLs, particularly those founded on copyright law, fail to accommodate the positionality of African and similarly situated users in the global data economy. In response, the article introduces the Nwulite Obodo Open Data Licence (NOODL Licence), a novel, tiered SPOL designed to foster equitable openness. NOODL differentiates conditions of use based on users’ geography and development context, incorporating benefit-sharing obligations and context-sensitive terms. It maintains the simplicity and legal clarity of existing SPOLs while addressing their inequities. By critically analysing the overlooked relationship between SPOLs and inequity, this article contributes a practical, context-aware licensing alternative that centres communities. While grounded in the African experience, the NOODL framework offers a replicable model for promoting fairness and inclusivity in global data governance and AI innovation…(More)”.
Paper by Iryna Susha et al: “To address complex societal challenges, governments increasingly need to make evidence-based decisions and require the best available data as input. As much of relevant data is now in the hands of the private sector, governments increasingly resort to purchasing data from private sources. There is, however, scant empirical evidence and a lack of understanding of how governments go about data purchasing. Therefore, we develop a new conceptual-analytical framework to analyze three models of data purchasing by governments: purchasing raw or aggregated data, data analyses, and data-based services. Next, based on Dutch data purchases, we explore the utility of our framework and create an evidence base detailing what data, data analyses, and data-based services Dutch governments purchase from whom, how, and for what purposes in the context of societal challenges. Our results map buyers and sellers of data in the Dutch context, as well as the types of data sold and in which policy domains. We expose a serious lack of transparency in government reporting on data purchasing. We further discuss our results in view of possible archetypes of data purchases and what purchasing strategy implications they have. Lastly, we propose several recommendations to practitioners and a research agenda for academics…(More)”.
Paper by Caterina Santoro et al: “Open data fall short of their goal to empower all social groups equally. Although the literature examines this issue through the concept of inclusion, substantial gaps remain in defining and understanding the implications of open data for equity in public administration, with research on this topic scattered across disciplines. This fragmentation hinders the possibility of evaluating public policies. To address this gap, we ask: What is the state of the art (naming) on equity in relation to open data, particularly regarding the causes and effects of inequities (blaming) and the strategies to address them (claiming)? Our interdisciplinary review of 69 studies finds that open data serve as a valuable tool for detecting inequities. However, they also raise concerns related to data justice, as inequities in open data arise from epistemic injustice, commodification, capability gaps, financial constraints, and governance structures reinforcing power asymmetries. To address these issues, we suggest balancing data pluralism with standardization and shifting research data practices toward reflexivity. Other strategies focus on governance and encompass stewardship and the adoption of collective benefit models. Our findings provide researchers and public officials with a lens to critically understand open data as new technologies emerge and build upon them…(More)”.
Interview by Emily Laber-Warren: “Police rely on tips from ordinary people — witnesses, victims and whistleblowers — to investigate 95 percent of crimes. Sometimes, the decision to speak up is easily made, but in other cases, people elect to stay silent, leaving countless infractions unpunished. About half of violent crimes go unreported, according to estimates by the US Department of Justice.
And yet at certain historical moments, such as in the United States in the early 1950s, when fear of communism led to many false reports against individuals working in entertainment and public service, societies can become places where people readily denounce one another — often falsely, or for petty reasons.
Tattling, whistleblowing, snitching, call it what you will: Patrick Bergemann has spent the past 15 years studying the many ways that people tell on one another, examining everything from Afghan villagers’ reports of illegal Taliban activity to informers’ charges of treason in 17th century Russia. In a recent article in the Annual Review of Sociology, he explores the social pressures that influence people’s decisions to expose, or conceal, wrongdoing. The choice to report reflects not just the infraction but a person’s loyalties and whether they expect to receive rewards or retaliation from authorities and peers, says Bergemann, a sociologist at the Paul Merage School of Business at the University of California, Irvine, and author of Judge Thy Neighbor: Denunciations in the Spanish Inquisition, Romanov Russia and Nazi Germany.
Bergemann talked with Knowable Magazine about why and when people report crimes and bad behavior, and how, for repressive governments, encouraging people to rat on neighbors and coworkers can be a potent form of social control…(More)”.
Paper by Sai Sanjna Chintakunta, Nathalia Nascimento, and Everton Guimaraes: “In recent years, Large Language Models (LLMs) have emerged as transformative tools across numerous domains, impacting how professionals approach complex analytical tasks. This systematic mapping study comprehensively examines the application of LLMs throughout the Data Science lifecycle. By analyzing relevant papers from Scopus and IEEE databases, we identify and categorize the types of LLMs being applied, the specific stages and tasks of the data science process they address, and the methodological approaches used for their evaluation. Our analysis includes a detailed examination of evaluation metrics employed across studies and systematically documents both positive contributions and limitations of LLMs when applied to data science workflows. This mapping provides researchers and practitioners with a structured understanding of the current landscape, highlighting trends, gaps, and opportunities for future research in this rapidly evolving intersection of LLMs and data science…(More)”.
Essay by John G. Palfrey: “…The world would be different if large, open datasets could be accessed at low cost by civil society actors, provided that they incorporated constraints to limit the dangerous uses of the same technologies. Recall the example of climate change, which posited that an open-source dataset, comprising various actors, methods, and geographies, could be used to identify and enact solutions to climate issues around the world in a fraction of the time it takes today….
Philanthropy can—and should—seek to help shape technologies for the good of humanity, rather than for profit. If we do not intervene in the public interest, we may find ourselves being haunted by this missed opportunity for a brighter future. Our previous approaches to investing in and governing new technologies have left too much power in the hands of too few. The harms associated with a laissez-faire approach in an era of artificial intelligence, as compared with the previous digital technologies, may be far greater. Promises by the tech industry, from the mid-1990s to today, to self-regulate and include community members in their growth and design have not come to fruition, but they can serve as a sort of reverse roadmap for how to imagine and design the next phase of technological change. We know what will happen if a laissez-faire approach predominates.
We need to learn from this past quarter-century and design a better, more public-interested approach for the decades to come. This moment of inflection allows us to use futurism to guide today’s investments, to remind ourselves that we can embed greater equity into the technology world, and to recommit to philanthropic practices that help to build a safe, sustainable, and just world…(More)”.
Article by Cal Newport: “Much of the euphoria and dread swirling around today’s artificial-intelligence technologies can be traced back to January, 2020, when a team of researchers at OpenAI published a thirty-page report titled “Scaling Laws for Neural Language Models.” The team was led by the A.I. researcher Jared Kaplan, and included Dario Amodei, who is now the C.E.O. of Anthropic. They investigated a fairly nerdy question: What happens to the performance of language models when you increase their size and the intensity of their training?
Back then, many machine-learning experts thought that, after they had reached a certain size, language models would effectively start memorizing the answers to their training questions, which would make them less useful once deployed. But the OpenAI paper argued that these models would only get better as they grew, and indeed that such improvements might follow a power law—an aggressive curve that resembles a hockey stick. The implication: if you keep building larger language models, and you train them on larger data sets, they’ll start to get shockingly good. A few months after the paper, OpenAI seemed to validate the scaling law by releasing GPT-3, which was ten times larger—and leaps and bounds better—than its predecessor, GPT-2.
Suddenly, the theoretical idea of artificial general intelligence, which performs as well as or better than humans on a wide variety of tasks, seemed tantalizingly close. If the scaling law held, A.I. companies might achieve A.G.I. by pouring more money and computing power into language models. Within a year, Sam Altman, the chief executive at OpenAI, published a blog post titled “Moore’s Law for Everything,” which argued that A.I. will take over “more and more of the work that people now do” and create unimaginable wealth for the owners of capital. “This technological revolution is unstoppable,” he wrote. “The world will change so rapidly and drastically that an equally drastic change in policy will be needed to distribute this wealth and enable more people to pursue the life they want.”
It’s hard to overstate how completely the A.I. community came to believe that it would inevitably scale its way to A.G.I. In 2022, Gary Marcus, an A.I. entrepreneur and an emeritus professor of psychology and neural science at N.Y.U., pushed back on Kaplan’s paper, noting that “the so-called scaling laws aren’t universal laws like gravity but rather mere observations that might not hold forever.” The negative response was fierce and swift. “No other essay I have ever written has been ridiculed by as many people, or as many famous people, from Sam Altman and Greg Brockton to Yann LeCun and Elon Musk,” Marcus later reflected. He recently told me that his remarks essentially “excommunicated” him from the world of machine learning. Soon, ChatGPT would reach a hundred million users faster than any digital service in history; in March, 2023, OpenAI’s next release, GPT-4, vaulted so far up the scaling curve that it inspired a Microsoft research paper titled “Sparks of Artificial General Intelligence.” Over the following year, venture-capital spending on A.I. jumped by eighty per cent…(More)”.
Article by Courtenay Brown, and Emily Peck: “There’s a new fear among investors and CEOs: flying blind on investments without sufficient data on the economy’s health.
Why it matters: The U.S. government produces some of the world’s premiere economic data. The future of those indicators looks murkier than ever, with no private sector source readily available to replace them.
How it works: The Bureau of Labor Statistics, which President Trump wants to overhaul, produces crucial data on jobs and inflation.
- Other sources, including private sector employment data from payroll processor ADP, help shape the understanding of how the economy is performing. But nothing yet can replace traditional government data.
What they’re saying: “At the end of the day all roads lead back to the government data,” Mark Zandi, chief economist at Moody’s Analytics, tells Axios. “If we don’t have that data, we’re going to be lost.”
- Businesses, state and local governments, the Federal Reserve and beyond will substitute private data that might be less reliable and “make decisions that are are just plain wrong when it matters the most,” Zandi said.
The intrigue: E.J. Antoni, President Trump’s pick to lead the Bureau of Labor Statistics, said those fears are the very reason why the agency should release its monthly jobs report less frequently.
- Along with the Trump administration, he’s criticized the BLS for the data revisions in its unemployment report.
- “How on Earth are businesses supposed to plan — or how is the Fed supposed to conduct monetary policy — when they don’t know how many jobs are being added or lost in our economy?” Antoni told Fox prior to his nomination.
- He suggested suspending the jobs report entirely, for an indefinite period of time, while the BLS revised its methods.
The other side: “Blaming the data because you get things wrong is bad. That’s a bad forecaster,” Joe Lavorgna, an economist who counsels Treasury Secretary Scott Bessent, tells Axios.
- “Questioning the data when you get multiple standard deviation missed and biases in the data in one direction — people have the right to question that.”..(More)”.