Particle News: AWS and Cerebras Strike Multiyear Deal for Disaggregated AI Inference on Bedrock

Overview

AWS says the Bedrock offering will roll out in the coming months, with some reports describing availability in the second half of 2026.
The architecture splits inference so Trainium3 handles prefill while Cerebras CS-3 accelerates token-by-token decode over low-latency Elastic Fabric Adapter networking.
Amazon describes the arrangement as a multiyear partnership with Cerebras, and both companies declined to disclose financial terms.
Cerebras and AWS cite large speed gains for the decode stage versus GPUs, with vendor claims of up to about 25x not yet independently verified.
AWS is the first cloud to offer this Cerebras disaggregated inference configuration exclusively through Bedrock, which coverage says adds pressure on Nvidia’s inference business.