Particle News: Chinese AI Models Undercut Western APIs by More Than 90%

Overview

Chinese firms have cut API prices and engineered models that cost roughly 90–97% less to run per token by using sparse mixture-of-experts architectures and lower-precision training.
DeepSeek reports training its V3 model for about $5.58 million and has permanently slashed V4-Pro prices by roughly 75%, with cached input costs falling to near zero in local currency terms.
Mixture-of-experts (MoE) reduces the number of active model parameters per token, which sharply cuts the compute needed for each inference and drives the bulk of the cost gap.
Developers are switching production traffic through OpenAI-compatible endpoints, routing layers, multi-key proxies and aggressive caching to capture savings, with some reporting large monthly bill drops.
Adoption is limited by practical frictions such as per-key rate limits, peak-hour latency, content filters and data-residency concerns, and these limits have already pushed companies to add caps and other cost controls.