Overview
- AWS says the Bedrock offering will roll out in the coming months, with some reports describing availability in the second half of 2026.
- The architecture splits inference so Trainium3 handles prefill while Cerebras CS-3 accelerates token-by-token decode over low-latency Elastic Fabric Adapter networking.
- Amazon describes the arrangement as a multiyear partnership with Cerebras, and both companies declined to disclose financial terms.
- Cerebras and AWS cite large speed gains for the decode stage versus GPUs, with vendor claims of up to about 25x not yet independently verified.
- AWS is the first cloud to offer this Cerebras disaggregated inference configuration exclusively through Bedrock, which coverage says adds pressure on Nvidia’s inference business.