Overview
- OpenAI introduced three speech models for real-time apps — GPT-Realtime-2, Realtime-Translate, and Realtime-Whisper — with the company describing Realtime-2 as having “GPT-5-class reasoning.”
- Realtime-2 expands the context window to 128,000 tokens and posts an 11% performance gain over version 1.5 to support longer, more complex conversations.
- New voice-agent controls include short preambles like “let me check that,” parallel tool calls during a chat, and selectable reasoning effort from minimal to xhigh.
- Pricing holds for Realtime-2 at $32 per 1 million audio input tokens and $64 per 1 million output tokens, while Translate is $0.034 per minute and Whisper is $0.017 per minute.
- Microsoft said these models are rolling out in Foundry to power live translation, low-latency transcription, and voice assistants that reason through multi-step tasks.