Overview
- Researchers evaluated Merlin on more than 50,000 previously unseen abdominal CT scans from multiple hospitals across six categories spanning over 750 tasks, where it matched or exceeded specialist models.
- Despite no chest CT training data, the model generalized to chest imaging and performed as well as or better than systems trained exclusively on chest scans.
- Across 692 diagnostic codes, Merlin correctly ranked which scan corresponded to a given code over 81% of the time, rising to 90% accuracy for a subset of 102 codes.
- In five-year disease risk assessments, Merlin identified higher-risk patients 75% of the time compared with 68% for a comparator model, suggesting sensitivity to imaging biomarkers.
- The team positions Merlin and its dataset as a community backbone and says clinical use will require local fine-tuning, additional validation, and regulatory pathways for simpler tasks first.