Particle News: Companies Slash AI Use After 'Token Shock' Drives Soaring Bills

Overview

Engineering teams and CIOs reported surprise, large inference costs that made internal AI programs unaffordable, with AMD’s CIO warning employee usage could expose very large firms to nearly $900 million in hidden annual bills.
After weeks of heavy internal adoption, firms are reversing incentives and adding controls such as per‑user token caps, budget dashboards, usage trackers, and limits on AI leaderboards to curb runaway spend.
Developers have deployed rapid cost fixes that do not require major product rewrites, including API proxies that route requests to cheaper models, aggressive caching and batching, and selective model routing that cuts bills by 50–90% in examples reported this week.
Market pressure is forcing providers to rethink economics as cheaper Chinese and open‑source models (for example Qwen variants) and platforms offering flat per‑request pricing grow in use as alternatives to per‑token billing.
The likely next steps are predictable pricing experiments and price competition that could include token‑rate cuts or more request‑based plans, and companies should watch for shifts that trade lower bills for added engineering complexity, compliance checks, or quality differences.