Article by Tripp Mickle, Cade Metz, Dylan Freedman, Teresa Mondría Terol and Keith Collins: “A recent analysis of AI Overviews found that they were accurate approximately nine out of 10 times. But with Google processing more than five trillion searches a year, this means that it provides tens of millions of erroneous answers every hour (or hundreds of thousands of inaccuracies every minute), according to an analysis done by an A.I. start-up called Oumi.
More than half of the accurate responses were “ungrounded,” meaning they linked to websites that did not completely support the information they provided. This makes it challenging to check AI Overviews’ accuracy.
Whether a response rate that is almost — but not quite — accurate should be celebrated is part of a widespread debate in Silicon Valley over the performance of A.I. systems. It speaks to the fundamental core of what we can trust online.
Some technologists argue that Google’s AI Overviews are reasonably accurate and that they have improved in recent months. But others worry that the average person may not realize those results need double-checking.
At the request of The New York Times, Oumi analyzed the accuracy of Google’s AI Overviews using a benchmark test called SimpleQA, which is widely used across the industry to measure the accuracy of A.I. systems. The start-up tested Google’s system in October, when the most complex questions were answered using an A.I. technology called Gemini 2, and then again in February, after it was upgraded to Gemini 3, a more powerful A.I. technology.
In both cases, Oumi’s analysis focused on 4,326 Google searches. The company found that the results were accurate 85 percent of the time with Gemini 2 and 91 percent of the time with Gemini 3…(More)”.