Particle.news
Download on the App Store

Study Says Google’s AI Overviews Miss About 1 in 10 Answers; Google Disputes Findings

Placement at the top of search magnifies the impact of subtle, weakly sourced errors.

Overview

  • Oumi, analyzing 4,326 queries for The New York Times with the SimpleQA test, found Google’s AI Overviews were 85% accurate with Gemini 2 and 91% with Gemini 3.
  • Extrapolating that error rate to roughly five trillion yearly searches suggests tens of millions of wrong answers every hour.
  • Google called the methodology flawed and pointed to issues in the OpenAI-built benchmark, while noting its ranking and safety systems screen poor sources.
  • Separate reporting cites a Google internal test that found Gemini 3 outputs were incorrect 28% of the time, though Google says AI Overviews perform better than the raw model.
  • Oumi reported weak grounding in the citations, with unsubstantiated links in 37% of Gemini 2 responses and 56% with Gemini 3, and frequent references to Facebook and Reddit.