Particle News: Harvard Study Finds OpenAI Model Outdiagnosed ER Physicians Using Only Notes

Overview

A peer-reviewed Science paper tested OpenAI’s o1-preview on real emergency room records and found it hit the exact or near-correct diagnosis about 67% of the time.
Two attending physicians given the same text reached roughly 50% to 55%, and blinded reviewers judged the outputs without knowing which were from people or AI.
The evaluation used records from 76 Beth Israel patients shown at triage, first doctor contact, and hospital admission to reflect messy, unedited notes.
Researchers stressed the study used text only and warned the model can suggest unnecessary tests or produce confident but wrong details.
The team plans prospective clinical trials and is pushing for accountability rules as more clinicians try AI tools to support diagnosis.