Overview
- JetBrains published a technical report and released Mellum2’s base, Instruct, and Thinking checkpoints on Monday, June 1, 2026, and made the weights available on the Hugging Face Hub under the permissive Apache 2.0 license.
- Mellum2 is a 12 billion parameter Mixture-of-Experts model that routes each token through 64 experts with 8 active experts per token so about 2.5 billion parameters are used per token, and it includes Grouped-Query Attention, Sliding Window Attention, and a YaRN-based extension to a 128K context window.
- JetBrains trained Mellum2 from scratch on roughly 10.6 trillion tokens and ships two post-trained variants: an Instruct model that replies directly and a Thinking model that emits an explicit reasoning trace before its answer.
- JetBrains’ benchmarks report that Mellum2 is competitive on code and reasoning tests while delivering more than two times faster inference in many concurrent, production-like workloads, but those speed and accuracy claims come from the developer and lack independent third-party validation so broader performance in real deployments is unproven.
- Because Mellum2 is open and licensed for commercial use, organizations can self-host it to reduce latency and keep code internal, yet doing so requires MoE routing infrastructure, ops expertise, and compatible hardware which will determine whether teams adopt it over hosted API models.