Can AI Agents Be Trusted?


Article by Blair Levin and Larry Downes: “Agentic AI has quickly become one of the most active areas of artificial intelligence development. AI agents are a level of programming on top of large language models (LLMs) that allow them to work towards specific goals. This extra layer of software can collect data, make decisions, take action, and adapt its behavior based on results. Agents can interact with other systems, apply reasoning, and work according to priorities and rules set by you as the principal.

Companies such as Salesforce have already deployed agents that can independently handle customer queries in a wide range of industries and applications, for example, and recognize when human intervention is required.

But perhaps the most exciting future for agentic AI will come in the form of personal agents, which can take self-directed action on your behalf. These agents will act as your personal assistant, handling calendar management, performing directed research and analysis, finding, negotiating for, and purchasing goods and services, curating content and taking over basic communications, learning and optimizing themselves along the way.

The idea of personal AI agents goes back decades, but the technology finally appears ready for prime-time. Already, leading companies are offering prototype personal AI agents to their customers, suppliers, and other stakeholders, raising challenging business and technical questions. Most pointedly: Can AI agents be trusted to act in our best interests? Will they work exclusively for us, or will their loyalty be split between users, developers, advertisers, and service providers? And how will be know?

The answers to these questions will determine whether and how quickly users embrace personal AI agents, and if their widespread deployment will enhance or damage business relationships and brand value…(More)”.

AI-Ready Federal Statistical Data: An Extension of Communicating Data Quality


Article by By Hoppe, Travis et al : “Generative Artificial Intelligence (AI) is redefining how people interact with public information and shaping how public data are consumed. Recent advances in large language models (LLMs) mean that more Americans are getting answers from AI chatbots and other AI systems, which increasingly draw on public datasets. The federal statistical community can take action to advance the use of federal statistics with generative AI to ensure that official statistics are front-and-center, powering these AIdriven experiences.
The Federal Committee on Statistical Methodology (FCSM) developed the Framework for Data Quality to help analysts and the public assess fitness for use of data sets. AI-based queries present new challenges, and the framework should be enhanced to meet them. Generative AI acts as an intermediary in the consumption of public statistical information, extracting and combining data with logical strategies that differ from the thought processes and judgments of analysts. For statistical data to be accurately represented and trustworthy, they need to be machine understandable and be able to support models that measure data quality and provide contextual information.
FCSM is working to ensure that federal statistics used in these AI-driven interactions meet the data quality dimensions of the Framework including, but not limited to, accessibility, timeliness, accuracy, and credibility. We propose a new collaborative federal effort to establish best practices for optimizing APIs, metadata, and data accessibility to support accurate and trusted generative AI results…(More)”.

The Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models via the Lens of Problem Complexity


Paper by Parshin Shojaee, Iman Mirzadeh, Keivan Alizadeh, Maxwell Horton, Samy Bengio, and Mehrdad Farajtabar: “Recent generations of frontier language models have introduced Large Reasoning Models (LRMs) that generate detailed thinking processes before providing answers. While these models demonstrate improved performance on reasoning benchmarks, their fundamental capabilities, scaling properties, and limitations remain insufficiently understood. Current evaluations primarily focus on established mathematical and coding benchmarks, emphasizing final answer accuracy. However, this evaluation paradigm often suffers from data contamination and does not provide insights into the reasoning traces’ structure and quality. In this work, we systematically investigate these gaps with the help of controllable puzzle environments that allow precise manipulation of compositional complexity while maintaining consistent logical structures. This setup enables the analysis of not only final answers but also the internal reasoning traces, offering insights into how LRMs “think”. Through extensive experimentation across diverse puzzles, we show that frontier LRMs face a complete accuracy collapse beyond certain complexities. Moreover, they exhibit a counter- intuitive scaling limit: their reasoning effort increases with problem complexity up to a point, then declines despite having an adequate token budget. By comparing LRMs with their standard LLM counterparts under equivalent inference compute, we identify three performance regimes: (1) low- complexity tasks where standard models surprisingly outperform LRMs, (2) medium-complexity tasks where additional thinking in LRMs demonstrates advantage, and (3) high-complexity tasks where both models experience complete collapse. We found that LRMs have limitations in exact computation: they fail to use explicit algorithms and reason inconsistently across puzzles. We also investigate the reasoning traces in more depth, studying the patterns of explored solutions and analyzing the models’ computational behavior, shedding light on their strengths, limitations, and ultimately raising crucial questions about their true reasoning capabilities…(More)”

The path for AI in poor nations does not need to be paved with billions


Editorial in Nature: “Coinciding with US President Donald Trump’s tour of Gulf states last week, Saudi Arabia announced that it is embarking on a large-scale artificial intelligence (AI) initiative. The proposed venture will have state backing and considerable involvement from US technology firms. It is the latest move in a global expansion of AI ambitions beyond the existing heartlands of the United States, China and Europe. However, as Nature India, Nature Africa and Nature Middle East report in a series of articles on AI in low- and middle-income countries (LMICs) published on 21 May (see go.nature.com/45jy3qq), the path to home-grown AI doesn’t need to be paved with billions, or even hundreds of millions, of dollars, or depend exclusively on partners in Western nations or China…, as a News Feature that appears in the series makes plain (see go.nature.com/3yrd3u2), many initiatives in LMICs aren’t focusing on scaling up, but on ‘scaling right’. They are “building models that work for local users, in their languages, and within their social and economic realities”.

More such local initiatives are needed. Some of the most popular AI applications, such as OpenAI’s ChatGPT and Google Gemini, are trained mainly on data in European languages. That would mean that the model is less effective for users who speak Hindi, Arabic, Swahili, Xhosa and countless other languages. Countries are boosting home-grown apps by funding start-up companies, establishing AI education programmes, building AI research and regulatory capacity and through public engagement.

Those LMICs that have started investing in AI began by establishing an AI strategy, including policies for AI research. However, as things stand, most of the 55 member states of the African Union and of the 22 members of the League of Arab States have not produced an AI strategy. That must change…(More)”.

The AI Policy Playbook


Playbook by AI Policymaker Network & Deutsche Gesellschaft für Internationale Zusammenarbeit (GIZ) GmbH: “It moves away from talking about AI ethics in abstract terms but tells of building policies that work right-away in emerging economies and respond to immediate development priorities. The Playbook emphasises that a one-size-fits-all solution doesn’t work. Rather, it illustrates shared challenges—like limited research capacity, fragmented data ecosystems, and compounding AI risks—while spotlighting national innovations and success stories. From drafting AI strategies to engaging communities and safeguarding rights, it lays out a roadmap grounded in local realities….What can you expect to find in the AI Policy Playbook:

  1. Policymaker Interviews
    Real-world insights from policymakers to understand their challenges and best practices.
  2. Policy Process Analysis
    Key elements from existing policies to extract effective strategies for AI governance, as well as policy mapping.
  3. Case Studies
    Examples of successes and lessons learnt from various countries to provide practical guidance.
  4. Recommendations
    Concrete solutions and recommendations from actors in the field to improve the policy development process, including quick tips for implementation and handling challenges.

What distinguishes this initiative is its commitment to peer learning and co-creation. The Africa-Asia AI Policymaker Network comprises over 30 high-level government partners who anchor the Playbook in real-world policy contexts. This ensures that the frameworks are not only theoretically sound but politically and socially implementable…(More)”

Hamburg Declaration on Responsible AI


Declaration by the United Nations Development Programme (UNDP), in partnership with the German Federal Ministry for Economic Cooperation and Development (BMZ): “We are at a crossroads. Despite the progress made in recent years, we need renewed commitment andvengagement to advance toward and achieve the Sustainable Development Goals (SDGs). Digital technologies, such as Artificial Intelligence (AI), can play a significant role in this regard. AI presents opportunities and risks in a world of rapid social, political, economic, ecological, and technological shifts. If developed and deployed responsibly, AI can drive sustainable development and benefit society, the economy, and the planet. Yet, without safeguards throughout the AI value chain, it may widen inequalities within and between countries and contribute to direct harm through inappropriate, illegal, or deliberate misuse. It can also contribute to human rights violations, fuel disinformation, homogenize creative and cultural expression, and harm the environment. These risks are likely to disproportionately affect low-income countries, vulnerable groups, and future generations. Geopolitical competition and market dependencies further amplify these risks…(More)”.

Silicon Valley Is at an Inflection Point


Article by Karen Hao: “…In the decade that I have observed Silicon Valley — first as an engineer, then as a journalist — I’ve watched the industry shift to a new paradigm. Tech companies have long reaped the benefits of a friendly U.S. government, but the Trump administration has made clear that it will now grant new firepower to the industry’s ambitions. The Stargate announcement was just one signal. Another was the Republican tax bill that the House passed last week, which would prohibit states from regulating A.I. for the next 10 years.

The leading A.I. giants are no longer merely multinational corporations; they are growing into modern-day empires. With the full support of the federal government, soon they will be able to reshape most spheres of society as they please, from the political to the economic to the production of science…(More)”.

Collective Bargaining in the Information Economy Can Address AI-Driven Power Concentration


Position paper by Nicholas Vincent, Matthew Prewitt and Hanlin Li: “…argues that there is an urgent need to restructure markets for the information that goes into AI systems. Specifically, producers of information goods (such as journalists, researchers, and creative professionals) need to be able to collectively bargain with AI product builders in order to receive reasonable terms and a sustainable return on the informational value they contribute. We argue that without increased market coordination or collective bargaining on the side of these primary information producers, AI will exacerbate a large-scale “information market failure” that will lead not only to undesirable concentration of capital, but also to a potential “ecological collapse” in the informational commons. On the other hand, collective bargaining in the information economy can create market frictions and aligned incentives necessary for a pro-social, sustainable AI future. We provide concrete actions that can be taken to support a coalitionbased approach to achieve this goal. For example, researchers and developers can establish technical mechanisms such as federated data management tools and explainable data value estimations, to inform and facilitate collective bargaining in the information economy. Additionally, regulatory and policy interventions may be introduced to support trusted data intermediary organizations representing guilds or syndicates of information producers…(More)”.

Some signs of AI model collapse begin to reveal themselves


Article by Steven J. Vaughan-Nichols: “I use AI a lot, but not to write stories. I use AI for search. When it comes to search, AI, especially Perplexity, is simply better than Google.

Ordinary search has gone to the dogs. Maybe as Google goes gaga for AI, its search engine will get better again, but I doubt it. In just the last few months, I’ve noticed that AI-enabled search, too, has been getting crappier.

In particular, I’m finding that when I search for hard data such as market-share statistics or other business numbers, the results often come from bad sources. Instead of stats from 10-Ks, the US Securities and Exchange Commission’s (SEC) mandated annual business financial reports for public companies, I get numbers from sites purporting to be summaries of business reports. These bear some resemblance to reality, but they’re never quite right. If I specify I want only 10-K results, it works. If I just ask for financial results, the answers get… interesting,

This isn’t just Perplexity. I’ve done the exact same searches on all the major AI search bots, and they all give me “questionable” results.

Welcome to Garbage In/Garbage Out (GIGO). Formally, in AI circles, this is known as AI model collapse. In an AI model collapse, AI systems, which are trained on their own outputs, gradually lose accuracy, diversity, and reliability. This occurs because errors compound across successive model generations, leading to distorted data distributions and “irreversible defects” in performance. The final result? A Nature 2024 paper stated, “The model becomes poisoned with its own projection of reality.”

Model collapse is the result of three different factors. The first is error accumulation, in which each model generation inherits and amplifies flaws from previous versions, causing outputs to drift from original data patterns. Next, there is the loss of tail data: In this, rare events are erased from training data, and eventually, entire concepts are blurred. Finally, feedback loops reinforce narrow patterns, creating repetitive text or biased recommendations…(More)”.

Ethical implications related to processing of personal data and artificial intelligence in humanitarian crises: a scoping review


Paper by Tino Kreutzer et al: “Humanitarian organizations are rapidly expanding their use of data in the pursuit of operational gains in effectiveness and efficiency. Ethical risks, particularly from artificial intelligence (AI) data processing, are increasingly recognized yet inadequately addressed by current humanitarian data protection guidelines. This study reports on a scoping review that maps the range of ethical issues that have been raised in the academic literature regarding data processing of people affected by humanitarian crises….

We identified 16,200 unique records and retained 218 relevant studies. Nearly one in three (n = 66) discussed technologies related to AI. Seventeen studies included an author from a lower-middle income country while four included an author from a low-income country. We identified 22 ethical issues which were then grouped along the four ethical value categories of autonomy, beneficence, non-maleficence, and justice. Slightly over half of included studies (n = 113) identified ethical issues based on real-world examples. The most-cited ethical issue (n = 134) was a concern for privacy in cases where personal or sensitive data might be inadvertently shared with third parties. Aside from AI, the technologies most frequently discussed in these studies included social media, crowdsourcing, and mapping tools.

Studies highlight significant concerns that data processing in humanitarian contexts can cause additional harm, may not provide direct benefits, may limit affected populations’ autonomy, and can lead to the unfair distribution of scarce resources. The increase in AI tool deployment for humanitarian assistance amplifies these concerns. Urgent development of specific, comprehensive guidelines, training, and auditing methods is required to address these ethical challenges. Moreover, empirical research from low and middle-income countries, disproportionally affected by humanitarian crises, is vital to ensure inclusive and diverse perspectives. This research should focus on the ethical implications of both emerging AI systems, as well as established humanitarian data management practices…(More)”.