Particle.news
Download on the App Store

Harvard Study Finds OpenAI’s o1 Model Outdiagnoses ER Physicians on Text-Only Cases

Researchers urge supervised trials before any clinical use.

Overview

  • Using real emergency-room records, the o1-preview model produced exact or near-correct diagnoses in about 67% of cases compared with roughly 50–55% for two attending physicians.
  • The head-to-head tests covered multiple ER stages, including triage, first doctor interaction, and admission, with blinded attending physicians grading the outputs.
  • OpenAI’s o1-preview also beat earlier systems on curated New England Journal of Medicine case sets and surpassed GPT-4 and OpenAI’s 4o in several reasoning tasks.
  • Study authors and outside experts said the work was limited to written electronic health records and noted the model can push unnecessary tests, so doctors must stay in the loop.
  • The team called for prospective clinical trials and tightly supervised pilots to see how AI can safely support bedside decisions in real workflows.