Particle.news
Download on the App Store

Google Unveils Split TPU 8 Line for Training and Inference

The move signals a bid to cut the price of serving AI by tuning hardware to specific tasks.

Overview

  • Google, which unveiled the eighth‑generation chips Wednesday at Cloud Next in Las Vegas, separated its accelerators into TPU 8t for training and TPU 8i for inference.
  • TPU 8t scales to 9,600 chips per superpod with 121 exaflops of compute and 2 petabytes of shared high‑bandwidth memory, and Google says reliability upgrades keep useful work above 97% of run time.
  • TPU 8i triples on‑chip SRAM to 384 MB, adds a low‑hop Boardfly fabric to cut chip‑to‑chip delay, and delivers about 80% better performance per dollar than the prior Ironwood generation for model serving.
  • New network designs such as Virgo for training and Boardfly for inference, plus Axion Arm CPU hosts and liquid cooling, aim to lower latency, raise utilization, and improve power efficiency for agentic and MoE workloads.
  • Both chips are due on Google Cloud later this year, were co‑developed with Broadcom with input from DeepMind, and will sit alongside Nvidia options that include the upcoming Vera Rubin GPUs.