Particle News: Mistral Releases Open-Weight Voxtral TTS for Faster, On-Device Voice AI

Overview

Voxtral TTS, released Thursday, is a ~4B-parameter text-to-speech model with open weights that supports nine languages and builds on Mistral’s Ministral 3B.
The company pitches the model for phones, laptops and other edge devices with low latency, though reported metrics differ on time to first audio and overall speed.
Time to first audio, the delay before speech starts, is cited at 90 milliseconds by TechCrunch and 70 milliseconds by Forbes, with real-time playback claims ranging from 6x to 9.7x.
Mistral says the system adapts a custom voice from very short samples, with reports ranging from about three seconds to 5–25 seconds, and it can keep a speaker’s traits across languages.
The weights are open but carry a noncommercial license, which limits production use and may push enterprises toward rival APIs from ElevenLabs, OpenAI and Deepgram even as an open option pressures pricing and control.