Overview
- The flagship Mistral Large 3 uses a sparse hybrid‑expert architecture with 675B total parameters, 41B active, and a 256,000‑token context window.
- Mistral says the model was trained from scratch on about 3,000 NVIDIA H200 GPUs and is built for enterprise workloads requiring long‑context processing.
- Company metrics claim parity with leading instruction‑tuned open models and strong multilingual dialogue and image understanding, with LMArena listing it second in OSS non‑inference and sixth on the overall OSS board.
- Mistral and NVIDIA report optimized deployment via TensorRT‑LLM, SGLang, vLLM, NVLink memory coherence, NVFP4, and Dynamo, including a claimed 10× throughput gain on GB200 NVL72 versus H200.
- Smaller dense Ministral 3 models at 14B, 8B, and 3B target NVIDIA Spark, RTX desktops and laptops, and Jetson, with support through Llama.cpp and Ollama and availability on mainstream open platforms, with NIM microservices promised soon.