DATA – Page 3 – The Living Library

Practitioner perspectives on informing decisions in One Health sectors with predictive models

Curated on June 23, 2025June 25, 2025 by Stefaan Verhulst

Paper by Kim M. Pepin: “Every decision a person makes is based on a model. A model is an idea about how a process works based on previous experience, observation, or other data. Models may not be explicit or stated (Johnson-Laird, 2010), but they serve to simplify a complex world. Models vary dramatically from conceptual (idea) to statistical (mathematical expression relating observed data to an assumed process and/or other data) or analytical/computational (quantitative algorithm describing a process). Predictive models of complex systems describe an understanding of how systems work, often in mathematical or statistical terms, using data, knowledge, and/or expert opinion. They provide means for predicting outcomes of interest, studying different management decision impacts, and quantifying decision risk and uncertainty (Berger et al. 2021; Li et al. 2017). They can help decision-makers assimilate how multiple pieces of information determine an outcome of interest about a complex system (Berger et al. 2021; Hemming et al. 2022).

People rely daily on system-level models to reach objectives. Choosing the fastest route to a destination is one example. Such a decision may be based on either a mental model of the road system developed from previous experience or a traffic prediction mapping application based on mathematical algorithms and current data. Either way, a system-level model has been applied and there is some uncertainty. In contrast, predicting outcomes for new and complex phenomena, such as emerging disease spread, a biological invasion risk (Chen et al. 2023; Elderd et al. 2006; Pepin et al. 2022), or climatic impacts on ecosystems is more uncertain. Here public service decision-makers may turn to mathematical models when expert opinion and experience do not resolve enough uncertainty about decision outcomes. But using models to guide decisions also relies on expert opinion and experience. Also, even technical experts need to make modeling choices regarding model structure and data inputs that have uncertainty (Elderd et al. 2006) and these might not be completely objective decisions (Bedson et al. 2021). Thus, using models for guiding decisions has subjectivity from both the developer and end-user, which can lead to apprehension or lack of trust about using models to inform decisions.

Models may be particularly advantageous to decision-making in One Health sectors, including health of humans, agriculture, wildlife, and the environment (hereafter called One Health sectors) and their interconnectedness (Adisasmito et al. 2022)…(More)”.

The Global A.I. Divide

Curated on June 23, 2025June 23, 2025 by Stefaan Verhulst

Article by Adam Satariano and Paul Mozur: “Last month, Sam Altman, the chief executive of the artificial intelligence company OpenAI, donned a helmet, work boots and a luminescent high-visibility vest to visit the construction site of the company’s new data center project in Texas.

Bigger than New York’s Central Park, the estimated $60 billion project, which has its own natural gas plant, will be one of the most powerful computing hubs ever created when completed as soon as next year.

Around the same time as Mr. Altman’s visit to Texas, Nicolás Wolovick, a computer science professor at the National University of Córdoba in Argentina, was running what counts as one of his country’s most advanced A.I. computing hubs. It was in a converted room at the university, where wires snaked between aging A.I. chips and server computers.

“Everything is becoming more split,” Dr. Wolovick said. “We are losing.”

Artificial intelligence has created a new digital divide, fracturing the world between nations with the computing power for building cutting-edge A.I. systems and those without. The split is influencing geopolitics and global economics, creating new dependencies and prompting a desperate rush to not be excluded from a technology race that could reorder economies, drive scientific discovery and change the way that people live and work.

The biggest beneficiaries by far are the United States, China and the European Union. Those regions host more than half of the world’s most powerful data centers, which are used for developing the most complex A.I. systems, according to data compiled by Oxford University researchers. Only 32 countries, or about 16 percent of nations, have these large facilities filled with microchips and computers, giving them what is known in industry parlance as “compute power.”..(More)”.

Library Catalogues as Data: Research, Practice and Usage

Curated on June 23, 2025June 23, 2025 by Stefaan Verhulst

Book by Paul Gooding, Melissa Terras, and Sarah Ames: “Through the web of library catalogues, library management systems and myriad digital resources, libraries have become repositories not only for physical and digital information resources but also for enormous amounts of data about the interactions between these resources and their users. Bringing together leading practitioners and academic voices, this book considers library catalogue data as a vital research resource.

Divided into four sections, each approaches library catalogues, collections and records from a different angle, from exploring methods for examining such data; to the politics of catalogues and library data; their interdisciplinary potential; and practical uses and applications of catalogues as data. Other topics the volume discusses include:

Practical routes to preparing library catalogue data for researchers
The ethics of library metadata privacy and reuse
Data-driven decision making
Data quality and collections bias
Preserving, resurrecting and restoring data
The uses and potential of historical library data
The intersection of catalogue data, AI and Large Language Models (LLMs)

This comprehensive book will be an essential read for practitioners in the GLAM sector, particularly those dealing with collections and catalogue data, and LIS academics and students…(More)”

Misinformation by Omission: The Need for More Environmental Transparency in AI

Curated on June 22, 2025June 22, 2025 by Stefaan Verhulst

Paper by Sasha Luccioni, Boris Gamazaychikov, Theo Alves da Costa, and Emma Strubell: “In recent years, Artificial Intelligence (AI) models have grown in size and complexity, driving greater demand for computational power and natural resources. In parallel to this trend, transparency around the costs and impacts of these models has decreased, meaning that the users of these technologies have little to no information about their resource demands and subsequent impacts on the environment. Despite this dearth of adequate data, escalating demand for figures quantifying AI’s environmental impacts has led to numerous instances of misinformation evolving from inaccurate or de-contextualized best-effort estimates of greenhouse gas emissions. In this article, we explore pervasive myths and misconceptions shaping public understanding of AI’s environmental impacts, tracing their origins and their spread in both the media and scientific publications. We discuss the importance of data transparency in clarifying misconceptions and mitigating these harms, and conclude with a set of recommendations for how AI developers and policymakers can leverage this information to mitigate negative impacts in the future…(More)”.

ChatGPT Has Already Polluted the Internet So Badly That It’s Hobbling Future AI Development

Curated on June 22, 2025June 22, 2025 by Stefaan Verhulst

Article by Frank Landymore: “The rapid rise of ChatGPT — and the cavalcade of competitors’ generative models that followed suit — has polluted the internet with so much useless slop that it’s already kneecapping the development of future AI models.

As the AI-generated data clouds the human creations that these models are so heavily dependent on amalgamating, it becomes inevitable that a greater share of what these so-called intelligences learn from and imitate is itself an ersatz AI creation.

Repeat this process enough, and AI development begins to resemble a maximalist game of telephone in which not only is the quality of the content being produced diminished, resembling less and less what it’s originally supposed to be replacing, but in which the participants actively become stupider. The industry likes to describe this scenario as AI “model collapse.”

As a consequence, the finite amount of data predating ChatGPT’s rise becomes extremely valuable. In a new feature, The Register likens this to the demand for “low-background steel,” or steel that was produced before the detonation of the first nuclear bombs, starting in July 1945 with the US’s Trinity test.

Just as the explosion of AI chatbots has irreversibly polluted the internet, so did the detonation of the atom bomb release radionuclides and other particulates that have seeped into virtually all steel produced thereafter. That makes modern metals unsuitable for use in some highly sensitive scientific and medical equipment. And so, what’s old is new: a major source of low-background steel, even today, is WW1 and WW2 era battleships, including a huge naval fleet that was scuttled by German Admiral Ludwig von Reuter in 1919…(More)”.

National engagement on public trust in data use for single patient record and GP health record published

Curated on June 21, 2025June 25, 2025 by Stefaan Verhulst

HTN Article: “A large-scale public engagement report commissioned by NHSE on building and maintaining public trust in data use across health and care has been published, focusing on the approach to creating a single patient record and the secondary use of GP data.

It noted “relief” and “enthusiasm” from participants around not having to repeat their health history when interacting with different parts of the health and care system, and highlighted concerns about data accuracy, privacy, and security.

120 participants were recruited for tier one, with 98 remaining by the end, for 15 hours of deliberation over three days in locations including Liverpool, Leicester, Portsmouth, and South London. Inclusive engagement for tier two recruited 76 people from “seldom heard groups” such as those with health needs or socially marginalised groups for interviews and small group sessions. A nationally representative ten-minute online survey with 2,000 people was also carried out in tier three.

“To start with, the concept of a single patient record was met with relief and enthusiasm across Tier 1 and Tier 2 participants,” according to the report….

When it comes to GP data, participants were “largely unaware” of secondary uses, but initially expressed comfort in the idea of it being used for saving lives, improving care, prevention, and efficiency in delivery of services. Concerns were broadly similar to those about the single patient record: concerns about data breaches, incorrect data, misuse, sensitivity of data being shared, bias against individuals, and the potential for re-identification. Some participants felt GP data should be treated differently because “it is likely to contain more intimate information”, offering greater risk to the individual patient if data were to be misused. Others felt it should be included alongside secondary care data to ensure a “comprehensive dataset”.

Participants were “reassured” overall by safeguards in place such as de-identification, staff training in data handling and security, and data regulation such as GDPR and the Data Protection Act. “There was a widespread feeling among Tier 1 and Tier 2 participants that the current model of the GP being the data controller for both direct care and secondary uses placed too much of a burden on GPs when it came to how data is used for secondary purposes,” findings show. “They wanted to see a new model which would allow for greater consistency of approach, transparency, and accountability.” Tier one participants suggested this could be a move to national or regional decision-making on secondary use. Tier three participants who only engaged with the topic online were “more resistant” to moving away from GPs as sole data controllers, with the report stating: “This greater reluctance to change demonstrates the need for careful communication with the public about this topic as changes are made, and continued involvement of the public.”..(More)”.

Disappearing people: A global demographic data crisis threatens public policy

Curated on June 20, 2025June 20, 2025 by Stefaan Verhulst

Article by Jessica M. Espey, Andrew J. Tatem, and Dana R. Thomson: “Every day, decisions that affect our lives—such as where to locate hospitals and how to allocate resources for schools—depend on knowing how many people live where and who they are; for example, their ages, occupations, living conditions, and needs. Such core demographic data in most countries come from a census, a count of the population usually conducted every 10 years. But something alarming is happening to many of these critical data sources. As widely discussed at the United Nations (UN) Statistical Commission meeting in New York in March, fewer countries have managed to complete a census in recent years. And even when they are conducted, censuses have been shown to undercount members of certain groups in important ways. Redressing this predicament requires investment and technological solutions alongside extensive political outreach, citizen engagement, and new partnerships…(More)”

DeepSeek Inside: Origins, Technology, and Impact

Curated on June 18, 2025June 19, 2025 by Stefaan Verhulst

Article by Michael A. Cusumano: “The release of DeepSeek V3 and R1 in January 2025 caused steep declines in the stock prices of companies that provide generative artificial intelligence (GenAI) infrastructure technology and datacenter services. These two large language models (LLMs) came from a little-known Chinese startup with approximately 200 employees versus at least 3,500 for industry-leader OpenAI. DeepSeek seemed to have developed this powerful technology much more cheaply than previously thought possible. If true, DeepSeek had the potential to disrupt the economics of the entire GenAI ecosystem and the dominance of U.S. companies ranging from OpenAI to Nvidia.

DeepSeek-R1 defines itself as “an artificial intelligence language model developed by OpenAI, specifically based on the generative pre-trained transformer (GPT) architecture.” Here, DeepSeek acknowledges that the transformer researchers (who published their landmark paper while at Google in 2017) and OpenAI developed its basic technology. Nonetheless, V3 and R1 display impressive skills in neural-network system design, engineering, and optimization, and DeepSeek’s publications provide rare insights into how the technology actually works. This column reviews, for the non-expert reader, what we know about DeepSeek’s origins, technology, and impact so far…(More)”.

AI is supercharging war. Could it also help broker peace?

Curated on June 18, 2025June 19, 2025 by Stefaan Verhulst

Article by Tina Amirtha: “Can we measure what is in our hearts and minds, and could it help us end wars any sooner? These are the questions that consume entrepreneur Shawn Guttman, a Canadian émigré who recently gave up his yearslong teaching position in Israel to accelerate a path to peace—using an algorithm.

Living some 75 miles north of Tel Aviv, Guttman is no stranger to the uncertainties of conflict. Over the past few months, miscalculated drone strikes and imprecise missile targets—some intended for larger cities—have occasionally landed dangerously close to his town, sending him to bomb shelters more than once.

“When something big happens, we can point to it and say, ‘Right, that happened because five years ago we did A, B, and C, and look at its effect,’” he says over Google Meet from his office, following a recent trip to the shelter. Behind him, souvenirs from the 1979 Egypt-Israel and 1994 Israel-Jordan peace treaties are visible. “I’m tired of that perspective.”

The startup he cofounded, Didi, is taking a different approach. Its aim is to analyze data across news outlets, political discourse, and social media to identify opportune moments to broker peace. Inspired by political scientist I. William Zartman’s “ripeness” theory, the algorithm—called the Ripeness Index—is designed to tell negotiators, organizers, diplomats, and nongovernmental organizations (NGOs) exactly when conditions are “ripe” to initiate peace negotiations, build coalitions, or launch grassroots campaigns.

During ongoing U.S.-led negotiations over the war in Gaza, both Israel and Hamas have entrenched themselves in opposing bargaining positions. Meanwhile, Israel’s traditional allies, including the U.S., have expressed growing frustration over the war and the dire humanitarian conditions in the enclave, where the threat of famine looms.

In Israel, Didi’s data is already informing grassroots organizations as they strategize which media outlets to target and how to time public actions, such as protests, in coordination with coalition partners. Guttman and his collaborators hope that eventually negotiators will use the model’s insights to help broker lasting peace.

Guttman’s project is part of a rising wave of so-called PeaceTech—a movement using technology to make negotiations more inclusive and data-driven. This includes AI from Hala Systems, which uses satellite imagery and data fusion to monitor ceasefires in Yemen and Ukraine. Another AI startup, Remesh, has been active across the Middle East, helping organizations of all sizes canvas key stakeholders. Its algorithm clusters similar opinions, giving policymakers and mediators a clearer view of public sentiment and division.

A range of NGOs and academic researchers have also developed digital tools for peacebuilding. The nonprofit Computational Democracy Project created Pol.is, an open-source platform that enables citizens to crowdsource outcomes to public debates. Meanwhile, the Futures Lab at the Center for Strategic and International Studies built a peace agreement simulator, complete with a chart to track how well each stakeholder’s needs are met.

Guttman knows it’s an uphill battle. In addition to the ethical and privacy concerns of using AI to interpret public sentiment, PeaceTech also faces financial hurdles. These companies must find ways to sustain themselves amid shrinking public funding and a transatlantic surge in defense spending, which has pulled resources away from peacebuilding initiatives.

Still, Guttman and his investors remain undeterred. One way to view the opportunity for PeaceTech is by looking at the economic toll of war. In its Global Peace Index 2024, the Institute for Economics and Peace’s Vision of Humanity platform estimated that economic disruption due to violence and the fear of violence cost the world $19.1 trillion in 2023, or about 13 percent of global GDP. Guttman sees plenty of commercial potential in times of peace as well.

“Can we make billions of dollars,” Guttman asks, “and save the world—and create peace?” ..(More)”….See also Kluz Prize for PeaceTech (Applications Open)

Sharing trustworthy AI models with privacy-enhancing technologies

Curated on June 17, 2025June 17, 2025 by Stefaan Verhulst

OECD Report: “Privacy-enhancing technologies (PETs) are critical tools for building trust in the collaborative development and sharing of artificial intelligence (AI) models while protecting privacy, intellectual property, and sensitive information. This report identifies two key types of PET use cases. The first is enhancing the performance of AI models through confidential and minimal use of input data, with technologies like trusted execution environments, federated learning, and secure multi-party computation. The second is enabling the confidential co-creation and sharing of AI models using tools such as differential privacy, trusted execution environments, and homomorphic encryption. PETs can reduce the need for additional data collection, facilitate data-sharing partnerships, and help address risks in AI governance. However, they are not silver bullets. While combining different PETs can help compensate for their individual limitations, balancing utility, efficiency, and usability remains challenging. Governments and regulators can encourage PET adoption through policies, including guidance, regulatory sandboxes, and R&D support, which would help build sustainable PET markets and promote trustworthy AI innovation…(More)”.