Particle News: Qwen3-TTS Launches as Open-Source, Low-Latency Multilingual TTS With 3-Second Voice Cloning

Overview

A newly posted arXiv report details a dual-track language-model design paired with two speech tokenizers to enable real-time synthesis and fine-grained control.
The 12Hz tokenizer is built for ultra-low latency, with first audio packets reported at about 97 milliseconds.
Training covers more than five million hours across ten languages to support multilingual and robust speech generation.
The authors report state-of-the-art results on a multilingual TTS test set, InstructTTSEval, and a long-speech benchmark.
Tongyi Lab says the full family is open-sourced under Apache 2.0, with weights, code, demos, and variants including VoiceDesign, CustomVoice, and Base at 0.6B and 1.7B parameters with fine-tuning support.