ChatGPT Has Already Polluted the Internet So Badly That It’s Hobbling Future AI Development


Article by Frank Landymore: “The rapid rise of ChatGPT — and the cavalcade of competitors’ generative models that followed suit — has polluted the internet with so much useless slop that it’s already kneecapping the development of future AI models.

As the AI-generated data clouds the human creations that these models are so heavily dependent on amalgamating, it becomes inevitable that a greater share of what these so-called intelligences learn from and imitate is itself an ersatz AI creation. 

Repeat this process enough, and AI development begins to resemble a maximalist game of telephone in which not only is the quality of the content being produced diminished, resembling less and less what it’s originally supposed to be replacing, but in which the participants actively become stupider. The industry likes to describe this scenario as AI “model collapse.”

As a consequence, the finite amount of data predating ChatGPT’s rise becomes extremely valuable. In a new featureThe Register likens this to the demand for “low-background steel,” or steel that was produced before the detonation of the first nuclear bombs, starting in July 1945 with the US’s Trinity test. 

Just as the explosion of AI chatbots has irreversibly polluted the internet, so did the detonation of the atom bomb release radionuclides and other particulates that have seeped into virtually all steel produced thereafter. That makes modern metals unsuitable for use in some highly sensitive scientific and medical equipment. And so, what’s old is new: a major source of low-background steel, even today, is WW1 and WW2 era battleships, including a huge naval fleet that was scuttled by German Admiral Ludwig von Reuter in 1919…(More)”.

National engagement on public trust in data use for single patient record and GP health record published


HTN Article: “A large-scale public engagement report commissioned by NHSE on building and maintaining public trust in data use across health and care has been published, focusing on the approach to creating a single patient record and the secondary use of GP data.

It noted “relief” and “enthusiasm” from participants around not having to repeat their health history when interacting with different parts of the health and care system, and highlighted concerns about data accuracy, privacy, and security.

120 participants were recruited for tier one, with 98 remaining by the end, for 15 hours of deliberation over three days in locations including Liverpool, Leicester, Portsmouth, and South London. Inclusive engagement for tier two recruited 76 people from “seldom heard groups” such as those with health needs or socially marginalised groups for interviews and small group sessions. A nationally representative ten-minute online survey with 2,000 people was also carried out in tier three.

“To start with, the concept of a single patient record was met with relief and enthusiasm across Tier 1 and Tier 2 participants,” according to the report….

When it comes to GP data, participants were “largely unaware” of secondary uses, but initially expressed comfort in the idea of it being used for saving lives, improving care, prevention, and efficiency in delivery of services. Concerns were broadly similar to those about the single patient record: concerns about data breaches, incorrect data, misuse, sensitivity of data being shared, bias against individuals, and the potential for re-identification. Some participants felt GP data should be treated differently because “it is likely to contain more intimate information”, offering greater risk to the individual patient if data were to be misused. Others felt it should be included alongside secondary care data to ensure a “comprehensive dataset”.

Participants were “reassured” overall by safeguards in place such as de-identification, staff training in data handling and security, and data regulation such as GDPR and the Data Protection Act. “There was a widespread feeling among Tier 1 and Tier 2 participants that the current model of the GP being the data controller for both direct care and secondary uses placed too much of a burden on GPs when it came to how data is used for secondary purposes,” findings show. “They wanted to see a new model which would allow for greater consistency of approach, transparency, and accountability.” Tier one participants suggested this could be a move to national or regional decision-making on secondary use. Tier three participants who only engaged with the topic online were “more resistant” to moving away from GPs as sole data controllers, with the report stating: “This greater reluctance to change demonstrates the need for careful communication with the public about this topic as changes are made, and continued involvement of the public.”..(More)”.

Disappearing people: A global demographic data crisis threatens public policy


Article by Jessica M. Espey, Andrew J. Tatem, and Dana R. Thomson: “Every day, decisions that affect our lives—such as where to locate hospitals and how to allocate resources for schools—depend on knowing how many people live where and who they are; for example, their ages, occupations, living conditions, and needs. Such core demographic data in most countries come from a census, a count of the population usually conducted every 10 years. But something alarming is happening to many of these critical data sources. As widely discussed at the United Nations (UN) Statistical Commission meeting in New York in March, fewer countries have managed to complete a census in recent years. And even when they are conducted, censuses have been shown to undercount members of certain groups in important ways. Redressing this predicament requires investment and technological solutions alongside extensive political outreach, citizen engagement, and new partnerships…(More)”

DeepSeek Inside: Origins, Technology, and Impact


Article by Michael A. Cusumano: “The release of DeepSeek V3 and R1 in January 2025 caused steep declines in the stock prices of companies that provide generative artificial intelligence (GenAI) infrastructure technology and datacenter services. These two large language models (LLMs) came from a little-known Chinese startup with approximately 200 employees versus at least 3,500 for industry-leader OpenAI. DeepSeek seemed to have developed this powerful technology much more cheaply than previously thought possible. If true, DeepSeek had the potential to disrupt the economics of the entire GenAI ecosystem and the dominance of U.S. companies ranging from OpenAI to Nvidia.

DeepSeek-R1 defines itself as “an artificial intelligence language model developed by OpenAI, specifically based on the generative pre-trained transformer (GPT) architecture.” Here, DeepSeek acknowledges that the transformer researchers (who published their landmark paper while at Google in 2017) and OpenAI developed its basic technology. Nonetheless, V3 and R1 display impressive skills in neural-network system design, engineering, and optimization, and DeepSeek’s publications provide rare insights into how the technology actually works. This column reviews, for the non-expert reader, what we know about DeepSeek’s origins, technology, and impact so far…(More)”.

AI is supercharging war. Could it also help broker peace?


Article by Tina Amirtha: “Can we measure what is in our hearts and minds, and could it help us end wars any sooner? These are the questions that consume entrepreneur Shawn Guttman, a Canadian émigré who recently gave up his yearslong teaching position in Israel to accelerate a path to peace—using an algorithm.

Living some 75 miles north of Tel Aviv, Guttman is no stranger to the uncertainties of conflict. Over the past few months, miscalculated drone strikes and imprecise missile targets—some intended for larger cities—have occasionally landed dangerously close to his town, sending him to bomb shelters more than once.

“When something big happens, we can point to it and say, ‘Right, that happened because five years ago we did A, B, and C, and look at its effect,’” he says over Google Meet from his office, following a recent trip to the shelter. Behind him, souvenirs from the 1979 Egypt-Israel and 1994 Israel-Jordan peace treaties are visible. “I’m tired of that perspective.”

The startup he cofounded, Didi, is taking a different approach. Its aim is to analyze data across news outlets, political discourse, and social media to identify opportune moments to broker peace. Inspired by political scientist I. William Zartman’s “ripeness” theory, the algorithm—called the Ripeness Index—is designed to tell negotiators, organizers, diplomats, and nongovernmental organizations (NGOs) exactly when conditions are “ripe” to initiate peace negotiations, build coalitions, or launch grassroots campaigns.

During ongoing U.S.-led negotiations over the war in Gaza, both Israel and Hamas have entrenched themselves in opposing bargaining positions. Meanwhile, Israel’s traditional allies, including the U.S., have expressed growing frustration over the war and the dire humanitarian conditions in the enclave, where the threat of famine looms.

In Israel, Didi’s data is already informing grassroots organizations as they strategize which media outlets to target and how to time public actions, such as protests, in coordination with coalition partners. Guttman and his collaborators hope that eventually negotiators will use the model’s insights to help broker lasting peace.

Guttman’s project is part of a rising wave of so-called PeaceTech—a movement using technology to make negotiations more inclusive and data-driven. This includes AI from Hala Systems, which uses satellite imagery and data fusion to monitor ceasefires in Yemen and Ukraine. Another AI startup, Remesh, has been active across the Middle East, helping organizations of all sizes canvas key stakeholders. Its algorithm clusters similar opinions, giving policymakers and mediators a clearer view of public sentiment and division.

A range of NGOs and academic researchers have also developed digital tools for peacebuilding. The nonprofit Computational Democracy Project created Pol.is, an open-source platform that enables citizens to crowdsource outcomes to public debates. Meanwhile, the Futures Lab at the Center for Strategic and International Studies built a peace agreement simulator, complete with a chart to track how well each stakeholder’s needs are met.

Guttman knows it’s an uphill battle. In addition to the ethical and privacy concerns of using AI to interpret public sentiment, PeaceTech also faces financial hurdles. These companies must find ways to sustain themselves amid shrinking public funding and a transatlantic surge in defense spending, which has pulled resources away from peacebuilding initiatives.

Still, Guttman and his investors remain undeterred. One way to view the opportunity for PeaceTech is by looking at the economic toll of war. In its Global Peace Index 2024, the Institute for Economics and Peace’s Vision of Humanity platform estimated that economic disruption due to violence and the fear of violence cost the world $19.1 trillion in 2023, or about 13 percent of global GDP. Guttman sees plenty of commercial potential in times of peace as well.

“Can we make billions of dollars,” Guttman asks, “and save the world—and create peace?” ..(More)”….See also Kluz Prize for PeaceTech (Applications Open)

Sharing trustworthy AI models with privacy-enhancing technologies


OECD Report: “Privacy-enhancing technologies (PETs) are critical tools for building trust in the collaborative development and sharing of artificial intelligence (AI) models while protecting privacy, intellectual property, and sensitive information. This report identifies two key types of PET use cases. The first is enhancing the performance of AI models through confidential and minimal use of input data, with technologies like trusted execution environments, federated learning, and secure multi-party computation. The second is enabling the confidential co-creation and sharing of AI models using tools such as differential privacy, trusted execution environments, and homomorphic encryption. PETs can reduce the need for additional data collection, facilitate data-sharing partnerships, and help address risks in AI governance. However, they are not silver bullets. While combining different PETs can help compensate for their individual limitations, balancing utility, efficiency, and usability remains challenging. Governments and regulators can encourage PET adoption through policies, including guidance, regulatory sandboxes, and R&D support, which would help build sustainable PET markets and promote trustworthy AI innovation…(More)”.

Understanding the Impacts of Generative AI Use on Children


Primer by The Alan Turing Institute and LEGO Foundation: “There is a growing body of research looking at the potential positive and negative impacts of generative AI and its associated risks. However, there is a lack of research that considers the potential impacts of these technologies on children, even though generative AI is already being deployed within many products and systems that children engage with, from games to educational platforms. Children have particular needs and rights that must be accounted for when designing, developing, and rolling out new technologies, and more focus on children’s rights is needed. While children are the group that may be most impacted by the widespread deployment of generative AI, they are simultaneously the group least represented in decision-making processes relating to the design, development, deployment or governance of AI. The Alan Turing Institute’s Children and AI and AI for Public Services teams explored the perspectives of children, parents, carers and teachers on generative AI technologies. Their research is guided by the ‘Responsible Innovation in Technology for Children’ (RITEC) framework for digital technology, play and children’s wellbeing established by UNICEF and funded by the LEGO Foundation and seeks to examine the potential impacts of generative AI on children’s wellbeing. The utility of the RITEC framework is that it allows for the qualitative analysis of wellbeing to take place by foregrounding more specific factors such as identity and creativity, which are further explored in each of the work packages.

The project provides unique and much needed insights into impacts of generative AI on children through combining quantitative and qualitative research methods…(More)”.

Comparative evaluation of behavioral epidemic models using COVID-19 data


Paper by Nicolò Gozzi, Nicola Perra, and Alessandro Vespignani: “Characterizing the feedback linking human behavior and the transmission of infectious diseases (i.e., behavioral changes) remains a significant challenge in computational and mathematical epidemiology. Existing behavioral epidemic models often lack real-world data calibration and cross-model performance evaluation in both retrospective analysis and forecasting. In this study, we systematically compare the performance of three mechanistic behavioral epidemic models across nine geographies and two modeling tasks during the first wave of COVID-19, using various metrics. The first model, a Data-Driven Behavioral Feedback Model, incorporates behavioral changes by leveraging mobility data to capture variations in contact patterns. The second and third models are Analytical Behavioral Feedback Models, which simulate the feedback loop either through the explicit representation of different behavioral compartments within the population or by utilizing an effective nonlinear force of infection. Our results do not identify a single best model overall, as performance varies based on factors such as data availability, data quality, and the choice of performance metrics. While the Data-Driven Behavioral Feedback Model incorporates substantial real-time behavioral information, the Analytical Compartmental Behavioral Feedback Model often demonstrates superior or equivalent performance in both retrospective fitting and out-of-sample forecasts. Overall, our work offers guidance for future approaches and methodologies to better integrate behavioral changes into the modeling and projection of epidemic dynamics…(More)”.

Fixing the US statistical infrastructure


Article by Nancy Potok and Erica L. Groshen: “Official government statistics are critical infrastructure for the information age. Reliable, relevant, statistical information helps businesses to invest and flourish; governments at the local, state, and national levels to make critical decisions on policy and public services; and individuals and families to invest in their futures. Yet surrounded by all manner of digitized data, one can still feel inadequately informed. A major driver of this disconnect in the US context is delayed modernization of the federal statistical system. The disconnect will likely worsen in coming months as the administration shrinks statistical agencies’ staffing, terminates programs (notably for health and education statistics), and eliminates unpaid external advisory groups. Amid this upheaval, might the administration’s appetite for disruption be harnessed to modernize federal statistics?

Federal statistics, one of the United States’ premier public goods, differ from privately provided data because they are privacy protected, aggregated to address relevant questions for decision-makers, constructed transparently, and widely available without a subscription. The private sector cannot be expected to adequately supply such statistical infrastructure. Yes, some companies collect and aggregate some economic data, such as credit card purchases and payroll information. But without strong underpinnings of a modern, federal information infrastructure, there would be large gaps in nationally consistent, transparent, trustworthy data. Furthermore, most private providers rely on public statistics for their internal analytics, to improve their products. They are among the many data users asking for more from statistical agencies…(More)”.

Generative AI Outlook Report


Outlook report, prepared by the European Commission’s Joint Research Centre (JRC): “…examines the transformative role of Generative AI (GenAI) with a specific emphasis on the European Union. It highlights the potential of GenAI for innovation, productivity, and societal change. GenAI is a disruptive technology due to its capability of producing human-like content at an unprecedented scale. As such, it holds multiple opportunities for advancements across various sectors, including healthcare, education, science, and creative industries. At the same time, GenAI also presents significant challenges, including the possibility to amplify misinformation, bias, labour disruption, and privacy concerns. All those issues are cross-cutting and therefore, the rapid development of GenAI requires a multidisciplinary approach to fully understand its implications. Against this context, the Outlook report begins with an overview of the technological aspects of GenAI, detailing their current capabilities and outlining emerging trends. It then focuses on economic implications, examining how GenAI can transform industry dynamics and necessitate adaptation of skills and strategies. The societal impact of GenAI is also addressed, with focus on both the opportunities for inclusivity and the risks of bias and over-reliance. Considering these challenges, the regulatory framework section outlines the EU’s current legislative framework, such as the AI Act and horizontal Data legislation to promote trustworthy and transparent AI practices. Finally, sector-specific ‘deep dives’ examine the opportunities and challenges that GenAI presents. This section underscores the need for careful management and strategic policy interventions to maximize its potential benefits while mitigating the risks. The report concludes that GenAI has the potential to bring significant social and economic impact in the EU, and that a comprehensive and nuanced policy approach is needed to navigate the challenges and opportunities while ensuring that technological developments are fully aligned with democratic values and EU legal framework…(More)”.