Particle.news
Download on the App Store

Mistral Releases Open-Weight Voxtral TTS for Faster, On-Device Voice AI

Noncommercial licensing plus disputed speed figures cloud near-term enterprise use.

Overview

  • Voxtral TTS, released Thursday, is a ~4B-parameter text-to-speech model with open weights that supports nine languages and builds on Mistral’s Ministral 3B.
  • The company pitches the model for phones, laptops and other edge devices with low latency, though reported metrics differ on time to first audio and overall speed.
  • Time to first audio, the delay before speech starts, is cited at 90 milliseconds by TechCrunch and 70 milliseconds by Forbes, with real-time playback claims ranging from 6x to 9.7x.
  • Mistral says the system adapts a custom voice from very short samples, with reports ranging from about three seconds to 5–25 seconds, and it can keep a speaker’s traits across languages.
  • The weights are open but carry a noncommercial license, which limits production use and may push enterprises toward rival APIs from ElevenLabs, OpenAI and Deepgram even as an open option pressures pricing and control.