Particle.news
Download on the App Store

AWS and Cerebras Strike Multiyear Deal for Disaggregated AI Inference on Bedrock

The forthcoming service pairs Trainium3 prefill with Cerebras CS-3 decode to reduce latency for interactive generative AI.

Overview

  • AWS says the Bedrock offering will roll out in the coming months, with some reports describing availability in the second half of 2026.
  • The architecture splits inference so Trainium3 handles prefill while Cerebras CS-3 accelerates token-by-token decode over low-latency Elastic Fabric Adapter networking.
  • Amazon describes the arrangement as a multiyear partnership with Cerebras, and both companies declined to disclose financial terms.
  • Cerebras and AWS cite large speed gains for the decode stage versus GPUs, with vendor claims of up to about 25x not yet independently verified.
  • AWS is the first cloud to offer this Cerebras disaggregated inference configuration exclusively through Bedrock, which coverage says adds pressure on Nvidia’s inference business.