Particle.news
Download on the App Store

OpenAI and Cerebras Launch GPT-5.3‑Codex‑Spark for Real‑Time Coding at 1,000+ Tokens per Second

Cerebras’ wafer‑scale compute enables ultra‑low‑latency responses in a research preview now rolling out to ChatGPT Pro users.

Overview

  • The release marks the first publicly available product of the OpenAICerebras collaboration and targets highly responsive, developer‑guided coding.
  • OpenAI positions Codex‑Spark as a small yet capable model designed for precise edits, plan adjustments, codebase Q&A, and rapid UI or style iteration.
  • The companies claim throughput above 1,000 tokens per second, crediting Cerebras’ Wafer‑Scale Engine for multi‑thousand‑token‑per‑second inference and large on‑chip memory.
  • OpenAI reports shorter task times and stronger results than GPT‑5.1‑Codex‑mini on SWE‑Bench Pro and Terminal‑Bench 2.0, based on internal benchmarks.
  • Access begins as a research preview in Codex apps, the CLI, and a VS Code extension, with API access opening gradually to select partners and a plan to extend the high‑speed capability to larger models later in 2026.