Particle News: HI-MoE Proposes Two-Stage Expert Routing for Object Detection

Overview

HI-MoE introduces a DETR-style detector that routes in two steps, using a scene router to pick a small pool of experts before an instance router assigns each object query to a few of them.
The authors report better accuracy than a dense DINO baseline and than simpler routing variants, with the biggest gains on small objects in the COCO benchmark.
The current draft focuses experiments on COCO with early specialization analysis on LVIS, and the paper shares ablations and visualizations of how experts specialize.
The paper frames this design around Mixture-of-Experts, where a gating network activates only a subset of specialized subnetworks to cut compute while keeping model capacity high.
Background coverage explains that MoE systems often use top-k routing with two experts per input and use noisy top-k to spread load, with Mixtral cited as a real-world example of this approach.