Particle.news
Download on the App Store

OpenAI Debuts GPT-5.3 Codex Spark for Real-Time Coding on Cerebras Chips

The research preview prioritizes ultra‑low latency on Cerebras’ Wafer Scale Engine to support OpenAI’s push to broaden its compute suppliers.

Overview

  • GPT-5.3-Codex-Spark is live in research preview for ChatGPT Pro users across the Codex app, CLI, and VS Code, with limited API access for select partners under separate rate limits and possible queuing.
  • The lightweight, text-only model targets interactive in‑editor work with a 128,000‑token context window, generating over 1,000 tokens per second and defaulting to minimal, interruptible edits without auto‑running tests.
  • OpenAI rewrote parts of its serving stack to cut latency, citing 80% lower client‑server roundtrip overhead, 30% lower per‑token overhead, and a 50% faster time‑to‑first‑token, with persistent WebSockets and a tuned Responses API.
  • Spark marks OpenAI’s first production deployment on Cerebras hardware and a concrete step in its multi‑vendor strategy alongside Nvidia, under a multi‑year deal reported at up to $10B and a planned 750 MW capacity build‑out.
  • OpenAI says Spark trades some capability versus GPT‑5.3‑Codex but completes tasks far faster, showing strong results on SWE‑Bench Pro and Terminal‑Bench 2.0 while GPUs remain foundational for broader training and inference.