Particle News: Stanford’s Merlin AI Delivers Specialist-Level Performance Across CT Tasks in Multi-Site Tests

Overview

Researchers evaluated Merlin on more than 50,000 previously unseen abdominal CT scans from multiple hospitals across six categories spanning over 750 tasks, where it matched or exceeded specialist models.
Despite no chest CT training data, the model generalized to chest imaging and performed as well as or better than systems trained exclusively on chest scans.
Across 692 diagnostic codes, Merlin correctly ranked which scan corresponded to a given code over 81% of the time, rising to 90% accuracy for a subset of 102 codes.
In five-year disease risk assessments, Merlin identified higher-risk patients 75% of the time compared with 68% for a comparator model, suggesting sensitivity to imaging biomarkers.
The team positions Merlin and its dataset as a community backbone and says clinical use will require local fine-tuning, additional validation, and regulatory pathways for simpler tasks first.