Overview
- Thinking Machines announced a research preview of “interaction models” that keep listening, seeing, and speaking in one continuous flow across audio, video, and text.
- The design uses 200‑millisecond micro‑turns for rapid back‑and‑forth and hands slower planning and tool use to a separate background model.
- The lab introduced TML‑Interaction‑Small, a mixture‑of‑experts system with 276 billion parameters and 12 billion active per step, and it reported large gains on new timing and temporal tests versus OpenAI’s GPT Realtime‑2 minimal.
- Claims include 64.7% accuracy on a time‑aware speech test called TimeSpeak and 35.4% on temporal action counting, though the article does not provide independent verification.
- Serving changes include streaming 200‑millisecond chunks via SGLang and a training‑to‑inference “bitwise” match for deterministic outputs, with the lab saying longer sessions and scale remain open work for 2026.