Overview
- OpenAI introduced three speech models in its Realtime API for developers, covering reasoning assistants, synchronous translation, and low‑latency transcription with usage-based pricing.
- GPT‑Realtime‑2 targets live, back‑and‑forth voice use and, according to OpenAI, delivers GPT‑5‑level reasoning with tool use and interruption handling, priced at $32 per million input tokens and $64 per million output tokens with $0.40 cached input.
- GPT‑Realtime‑Translate supports 70 input languages to 13 output languages with speech that keeps pace with the speaker, priced at $0.034 per minute.
- GPT‑Realtime‑Whisper focuses on streaming transcription that keeps up with conversation for captions and meeting notes, priced at $0.017 per minute.
- The launch extends OpenAI’s real‑time, multimodal push as rivals also advance, with China’s StepAudio 2.5 Realtime now live and touting emotion sensing from tone and pauses and deep persona controls, which are vendor claims.