Overview
- HI-MoE introduces a DETR-style detector that routes in two steps, using a scene router to pick a small pool of experts before an instance router assigns each object query to a few of them.
- The authors report better accuracy than a dense DINO baseline and than simpler routing variants, with the biggest gains on small objects in the COCO benchmark.
- The current draft focuses experiments on COCO with early specialization analysis on LVIS, and the paper shares ablations and visualizations of how experts specialize.
- The paper frames this design around Mixture-of-Experts, where a gating network activates only a subset of specialized subnetworks to cut compute while keeping model capacity high.
- Background coverage explains that MoE systems often use top-k routing with two experts per input and use noisy top-k to spread load, with Mixtral cited as a real-world example of this approach.