Particle News: New Audits Find AI Chatbots Often Mislead on Health Advice

Overview

The BMJ Open audit, published Tuesday, found about half of 250 answers from five popular chatbots were problematic, with Grok producing the most high‑risk replies and Gemini the fewest, and many answers delivered with confident tone but poor or incomplete references.
Following Monday’s JAMA Network Open study from Mass General Brigham, researchers reported chatbots failed more than 80% of the time at listing likely causes from limited symptoms, yet final diagnoses improved once full exam and lab data were provided, reinforcing the need for supervision.
A separate review in The Annals of the Royal College of Surgeons of England documented fabricated or unverifiable medical citations, including 34% in Grok 3, which makes it hard for users to check if advice is backed by real evidence.
A Nature‑reported experiment showed that fake preprints about a made‑up condition seeded online were later echoed by AI tools and even cited by researchers, illustrating how contaminated sources can flow into model outputs.
Clinicians and study authors warn that chatbots are not licensed to give medical advice and call for public education, tighter governance, and guarded deployments such as intake bots that route patients to real clinicians as patients increasingly consult AI for health questions.