Particle News: Merlin Foundation Model Outperforms Specialist CT Tools Across Hospitals

Overview

The Nature-published study evaluated Merlin across six categories spanning more than 750 diagnostic, prognostic and quality tasks.
The model was pretrained on a linked Stanford corpus of over 15,000 3D abdominal CT scans with radiology reports plus nearly one million diagnosis codes.
In external testing on more than 50,000 unseen abdominal CTs from four hospitals, Merlin delivered stronger off-the-shelf performance than other vision–language models and matched or beat task-specific systems.
On diagnosis-code prediction, Merlin reached about 81% average accuracy across 692 codes, rising to 90% on a 102-code subset, and it ranked five-year disease risk at 75% versus 68% for a comparator.
Despite training only on abdomen scans, the model generalized to chest CT interpretation, and the team encourages local fine-tuning as NIH and MIDRC-backed work advances toward approvals for simpler tasks.