Book by Jude Browne: “Not a day goes by without a new story on the perils of technology: from increasingly clever machines that surpass human capability and comprehension to genetic technologies capable of altering the human genome in ways we cannot predict. How can we respond? What should we do politically? Focusing on the rise of robotics and artificial intelligence (AI), and the impact of new reproductive and genetic technologies (Repro-tech), Jude Browne questions who has political responsibility for the structural impacts of these technologies and how we might go about preparing for the far-reaching societal changes they may bring. This thought-provoking book tackles some of the most pressing issues of our time and offers a compelling vision for how we can respond to these challenges in a way that is both politically feasible and socially responsible…(More)”.
Trump Admin Plans to Cut Team Responsible for Critical Atomic Measurement Data
Article by Louise Matsakis and Will Knight: “The US National Institute of Standards and Technology (NIST) is discussing plans to eliminate an entire team responsible for publishing and maintaining critical atomic measurement data in the coming weeks, as the Trump administration continues its efforts to reduce the US federal workforce, according to a March 18 email sent to dozens of outside scientists. The data in question underpins advanced scientific research around the world in areas like semiconductor manufacturing and nuclear fusion…(More)”.
Web 3.0 Requires Data Integrity
Article by Bruce Schneier and Davi Ottenheimer: “If you’ve ever taken a computer security class, you’ve probably learned about the three legs of computer security—confidentiality, integrity, and availability—known as the CIA triad.a When we talk about a system being secure, that’s what we’re referring to. All are important, but to different degrees in different contexts. In a world populated by artificial intelligence (AI) systems and artificial intelligent agents, integrity will be paramount.
What is data integrity? It’s ensuring that no one can modify data—that’s the security angle—but it’s much more than that. It encompasses accuracy, completeness, and quality of data—all over both time and space. It’s preventing accidental data loss; the “undo” button is a primitive integrity measure. It’s also making sure that data is accurate when it’s collected—that it comes from a trustworthy source, that nothing important is missing, and that it doesn’t change as it moves from format to format. The ability to restart your computer is another integrity measure.
The CIA triad has evolved with the Internet. The first iteration of the Web—Web 1.0 of the 1990s and early 2000s—prioritized availability. This era saw organizations and individuals rush to digitize their content, creating what has become an unprecedented repository of human knowledge. Organizations worldwide established their digital presence, leading to massive digitization projects where quantity took precedence over quality. The emphasis on making information available overshadowed other concerns.
As Web technologies matured, the focus shifted to protecting the vast amounts of data flowing through online systems. This is Web 2.0: the Internet of today. Interactive features and user-generated content transformed the Web from a read-only medium to a participatory platform. The increase in personal data, and the emergence of interactive platforms for e-commerce, social media, and online everything demanded both data protection and user privacy. Confidentiality became paramount.
We stand at the threshold of a new Web paradigm: Web 3.0. This is a distributed, decentralized, intelligent Web. Peer-to-peer social-networking systems promise to break the tech monopolies’ control on how we interact with each other. Tim Berners-Lee’s open W3C protocol, Solid, represents a fundamental shift in how we think about data ownership and control. A future filled with AI agents requires verifiable, trustworthy personal data and computation. In this world, data integrity takes center stage…(More)”.
Cloze Encounters: The Impact of Pirated Data Access on LLM Performance
Paper by Stella Jia & Abhishek Nagaraj: “Large Language Models (LLMs) have demonstrated remarkable capabilities in text generation, but their performance may be influenced by the datasets on which they are trained, including potentially unauthorized or pirated content. We investigate the extent to which data access through pirated books influences LLM responses. We test the performance of leading foundation models (GPT, Claude, Llama, and Gemini) on a set of books that were and were not included in the Books3 dataset, which contains full-text pirated books and could be used for LLM training. We assess book-level performance using the “name cloze” word-prediction task. To examine the causal effect of Books3 inclusion we employ an instrumental variables strategy that exploits the pattern of book publication years in the Books3 dataset. In our sample of 12,916 books, we find significant improvements in LLM name cloze accuracy on books available within the Books3 dataset compared to those not present in these data. These effects are more pronounced for less popular books as compared to more popular books and vary across leading models. These findings have crucial implications for the economics of digitization, copyright policy, and the design and training of AI systems…(More)”.
Getting the Public on Side: How to Make Reforms Acceptable by Design
OECD Report: “Public acceptability is a crucial condition for the successful implementation of reforms. The challenges raised by the green, digital and demographic transitions call for urgent and ambitious policy action. Despite this, governments often struggle to build sufficiently broad public support for the reforms needed to promote change. Better information and effective public communication have a key role to play. But policymakers cannot get the public to choose the side of reform without a proper understanding of people’s views and how they can help strengthen the policy process.
Perceptual and behavioural data provide an important source of insights on the perceptions, attitudes and preferences that constitute the “demand-side” of reform. The interdisciplinary OECD Expert Group on New Measures of the Public Acceptability of Reforms was set up in 2021 to take stock of these insights and explore their potential for improving policy. This report reflects the outcomes of the Expert Group’s work. It looks at and assesses (i) the available data and what they can tell policymakers about people’s views; (ii) the analytical frameworks through which these data are interpreted; and (iii) the policy tools through which considerations of public acceptability are integrated into the reform process…(More)”.
Should AGI-preppers embrace DOGE?
Blog by Henry Farrell: “…AGI-prepping is reshaping our politics. Wildly ambitious claims for AGI have not only shaped America’s grand strategy, but are plausibly among the justifying reasons for DOGE.
After the announcement of DOGE, but before it properly got going, I talked to someone who was not formally affiliated, but was very definitely DOGE adjacent. I put it to this individual that tearing out the decision making capacities of government would not be good for America’s ability to do things in the world. Their response (paraphrased slightly) was: so what? We’ll have AGI by late 2026. And indeed, one of DOGE’s major ambitions, as described in a new article in WIRED, appears to have been to pull as much government information as possible into a large model that could then provide useful information across the totality of government.
The point – which I don’t think is understood nearly widely enough – is that radical institutional revolutions such as DOGE follow naturally from the AGI-prepper framework. If AGI is right around the corner, we don’t need to have a massive federal government apparatus, organizing funding for science via the National Science Foundation and the National Institute for Health. After all, in Amodei and Pottinger’s prediction:
By 2027, AI developed by frontier labs will likely be smarter than Nobel Prize winners across most fields of science and engineering. … It will be able to … complete complex tasks that would take people months or years, such as designing new weapons or curing diseases.
Who needs expensive and cumbersome bureaucratic institutions for organizing funding scientists in a near future where a “country of geniuses [will be] contained in a data center,” ready to solve whatever problems we ask them to? Indeed, if these bottled geniuses are cognitively superior to humans across most or all tasks, why do we need human expertise at all, beyond describing and explaining human wants? From this perspective, most human based institutions are obsolescing assets that need to be ripped out, and DOGE is only the barest of beginnings…(More)”.
Bubble Trouble
Article by Bryan McMahon: “…Venture capital (VC) funds, drunk on a decade of “growth at all costs,” have poured about $200 billion into generative AI. Making matters worse, the stock market’s bull run is deeply dependent on the growth of the Big Tech companies fueling the AI bubble. In 2023, 71 percent of the total gains in the S&P 500 were attributable to the “Magnificent Seven”—Apple, Nvidia, Tesla, Alphabet, Meta, Amazon, and Microsoft—all of which are among the biggest spenders on AI. Just four—Microsoft, Alphabet, Amazon, and Meta—combined for $246 billion of capital expenditure in 2024 to support the AI build-out. Goldman Sachs expects Big Tech to spend over $1 trillion on chips and data centers to power AI over the next five years. Yet OpenAI, the current market leader, expects to lose $5 billion this year, and its annual losses to swell to $11 billion by 2026. If the AI bubble bursts, it not only threatens to wipe out VC firms in the Valley but also blow a gaping hole in the public markets and cause an economy-wide meltdown…(More)”.
From Insights to Action: Amplifying Positive Deviance within Somali Rangelands
Article by Basma Albanna, Andreas Pawelke and Hodan Abdullahi: “In every community, some individuals or groups achieve significantly better outcomes than their peers, despite having similar challenges and resources. Finding these so-called positive deviants and working with them to diffuse their practices is referred to as the Positive Deviance approach. The Data-Powered Positive Deviance (DPPD) method follows the same logic as the Positive Deviance approach but leverages existing, non-traditional data sources, in conjunction with traditional data sources to identify and scale the solutions of positive deviants. The UNDP Somalia Accelerator Lab was part of the first cohort of teams that piloted the application of DPPD trying to tackle the rangeland health problem in the West Golis region. In this blog post we’re reflecting on the process we designed and tested to go from the identification and validation of successful practices to helping other communities adopt them.
Uncovering Rangeland Success
Three years ago we embarked on a journey to identify pastoral communities in Somaliland that demonstrated resilience in the face of adversity. Using a mix of traditional and non-traditional data sources, we wanted to explore and learn from communities that managed to have healthy rangelands despite the severe droughts of 2016 and 2017.
We engaged with government officials from various ministries, experts from the University of Hargeisa, international organizations like the FAO and members of agro-pastoral communities to learn more about rangeland health. We then selected the West Golis as our region of interest with a majority pastoral community and relative ease of access. Employing the Soil-Adjusted Vegetation Index (SAVI) and using geospatial and earth observation data allowed us to identify an initial group of potential positive deviants illustrated as green circles in Figure 1 below.

Following the identification of potential positive deviants, we engaged with 18 pastoral communities from the Togdheer, Awdal, and Maroodijeex regions to validate whether the positive deviants we found using earth observation data were indeed doing better than the other communities.
The primary objective of the fieldwork was to uncover the existing practices and strategies that could explain the outperformance of positively-deviant communities compared to other communities. The research team identified a range of strategies, including soil and water conservation techniques, locally-produced pesticides, and reseeding practices as summarized in Figure 2.

Data-Powered Positive Deviance is not just about identifying outperformers and their successful practices. The real value lies in the diffusion, adoption and adaptation of these practices by individuals, groups or communities facing similar challenges. For this to succeed, both the positive deviants and those learning about their practices must take ownership and drive the process. Merely presenting the uncommon but successful practices of positive deviants to others will not work. The secret to success is in empowering the community to take charge, overcome challenges, and leverage their own resources and capabilities to effect change…(More)”.
Integrating Social Media into Biodiversity Databases: The Next Big Step?
Article by Muhammad Osama: “Digital technologies and social media have transformed ecology and conservation biology data collection. Traditional biodiversity monitoring often relies on field surveys, which can be time-consuming and biased toward rural habitats.
The Global Biodiversity Information Facility (GBIF) serves as a key repository for biodiversity data, but it faces challenges such as delayed data availability and underrepresentation of urban habitats.
Social media platforms have become valuable tools for rapid data collection, enabling users to share georeferenced observations instantly, reducing time lags associated with traditional methods. The widespread use of smartphones with cameras allows individuals to document wildlife sightings in real-time, enhancing biodiversity monitoring. Integrating social media data with traditional ecological datasets offers significant advancements, particularly in tracking species distributions in urban areas.
In this paper, the authors evaluated the Jersey tiger moth’s habitat usage by comparing occurrence data from social media platforms (Instagram and Flickr) with traditional records from GBIF and iNaturalist. They hypothesized that social media data would reveal significant JTM occurrences in urban environments, which may be underrepresented in traditional datasets…(More)”.
The Language Data Space (LDS)
European Commission: “… welcomes launch of the Alliance for Language Technologies European Digital Infrastructure Consortium (ALT-EDIC) and the Language Data Space (LDS).
Aimed at addressing the shortage of European language data needed for training large language models, these projects are set to revolutionise multilingual Artificial Intelligence (AI) systems across the EU.
By offering services in all EU languages, the initiatives are designed to break down language barriers, providing better, more accessible solutions for smaller businesses within the EU. This effort not only aims to preserve the EU’s rich cultural and linguistic heritage in the digital age but also strengthens Europe’s quest for tech sovereignty. Formed in February 2024, the ALT-EDIC includes 17 participating Member States and 9 observer Member States and regions, making it one of the pioneering European Digital Infrastructure Consortia.
The LDS, part of the Common European Data Spaces, is crucial for increasing data availability for AI development in Europe. Developed by the Commission and funded by the DIGITAL programme, this project aims to create a cohesive marketplace for language data. This will enhance the collection and sharing of multilingual data to support European large language models. Initially accessible to selected institutions and companies, the project aims to eventually involve all European public and private stakeholders.
Find more information about the Alliance for Language Technologies European Digital Infrastructure Consortium (ALT-EDIC) and the Language Data Space (LDS)…(More)”