Particle News: Harvard Study Finds OpenAI’s o1 Model Outdiagnoses ER Physicians on Text-Only Cases

Overview

Using real emergency-room records, the o1-preview model produced exact or near-correct diagnoses in about 67% of cases compared with roughly 50–55% for two attending physicians.
The head-to-head tests covered multiple ER stages, including triage, first doctor interaction, and admission, with blinded attending physicians grading the outputs.
OpenAI’s o1-preview also beat earlier systems on curated New England Journal of Medicine case sets and surpassed GPT-4 and OpenAI’s 4o in several reasoning tasks.
Study authors and outside experts said the work was limited to written electronic health records and noted the model can push unnecessary tests, so doctors must stay in the loop.
The team called for prospective clinical trials and tightly supervised pilots to see how AI can safely support bedside decisions in real workflows.