Overview
- Reviewers assessed more than 3,000 answers from ChatGPT, Copilot, Gemini, and Perplexity, finding significant issues in 45% of responses across 18 countries.
- Serious sourcing problems appeared in 31% of outputs and major accuracy errors in 20%, including hallucinations and outdated information.
- Google’s Gemini was the weakest performer, with significant issues in 76% of its replies and sourcing inaccuracies in 72%, far above ChatGPT (24%) and Perplexity/Copilot (15%).
- The report includes concrete failures, such as false claims about Pope Francis and incorrect statements about astronauts being stranded on the International Space Station.
- The study was released with a developer and newsroom toolkit and Ipsos polling showing 42% trust in AI news summaries yet a sharp trust drop after errors, reinforcing calls for independent monitoring and improved sourcing practices.