Alibaba Launches Qwen3.5-Omni Multimodal AI With Audio and Video Claims
The launch underscores Alibaba's bid for leadership in real-time, multimodal AI.
Overview
- The system accepts text, images, audio and video in one model and generates fine, time-stamped captions to turn long clips into searchable notes.
- Alibaba says the Omni-Plus variant sets 215 state-of-the-art results in audio and video tasks and beats Gemini 3.1 Pro on many audio measures.
- Live voice gains interruption handling, voice cloning and voice control to keep dialogue smooth in real time.
- The model includes web search and function calling so it can pull live information and use tools to complete tasks.
- Developers can access the API on Alibaba Cloud’s Bailian in Plus, Flash and Light sizes, and a separate Qwen 3.6 Plus preview is now available on OpenRouter.