Particle News: Google’s TurboQuant Slashes Inference Memory Use as Analysts Reassess the Hit to Chip Demand

Overview

Following Thursday’s selloff, memory makers from SanDisk to SK Hynix and Samsung fell sharply, with SanDisk down about 11% after Google disclosed TurboQuant.
TurboQuant compresses an AI model’s key‑value cache during inference by at least 6x and runs near 3‑bit precision without retraining, using building blocks called PolarQuant and Quantized Johnson‑Lindenstrauss.
Morgan Stanley said investors misread the impact because the method targets inference caches rather than the high‑bandwidth memory used to train large models, and the bank kept Overweight ratings on Micron and SanDisk.
KAIST, whose professor Han In‑su co‑developed two of the algorithms, said the efficiency will widen AI use and could ultimately expand total memory demand by making deployments cheaper and more practical.
Google published the techniques openly and developers ported them to local frameworks within a day, and the company plans to present full results at ICLR and AISTATS as researchers and buyers watch for real‑world performance and rollout.