Particle.news
Download on the App Store

Stanford’s Merlin AI Delivers Specialist-Level Performance Across CT Tasks in Multi-Site Tests

The NIH-funded 3D vision–language model trained on a large linked abdominal CT dataset showed strong off-the-shelf generalization across external hospitals.

Overview

  • Researchers evaluated Merlin on more than 50,000 previously unseen abdominal CT scans from multiple hospitals across six categories spanning over 750 tasks, where it matched or exceeded specialist models.
  • Despite no chest CT training data, the model generalized to chest imaging and performed as well as or better than systems trained exclusively on chest scans.
  • Across 692 diagnostic codes, Merlin correctly ranked which scan corresponded to a given code over 81% of the time, rising to 90% accuracy for a subset of 102 codes.
  • In five-year disease risk assessments, Merlin identified higher-risk patients 75% of the time compared with 68% for a comparator model, suggesting sensitivity to imaging biomarkers.
  • The team positions Merlin and its dataset as a community backbone and says clinical use will require local fine-tuning, additional validation, and regulatory pathways for simpler tasks first.