Overview
- Tenstorrent, which announced general availability Tuesday, introduced its Networked AI design and Galaxy Blackhole servers for running large language and video models.
- Each 6U system uses 32 Blackhole accelerators tied in a 100 Tbps Ethernet mesh with 1 TB of GDDR6 and 16 TB/s of memory bandwidth, delivering a stated 23 petaFLOPS of FP8 performance for $110,000.
- A base four-node Galaxy Supercluster lists at $440,000 and the architecture scales to 32 nodes with more than a thousand chips for bigger models and higher user loads.
- The company reports a four-node setup processes a 100,000‑token DeepSeek V3 prompt in under four seconds and reaches up to 300 tokens per second per user, though batch size was not disclosed and independent tests are still pending.
- Early deployments include Equinix’s Distributed AI Hub in Ashburn, Cirrascale, and Japan’s ai&, and Tenstorrent says most Hugging Face models run on its stack with more details slated for a May 1 TT-Deploy briefing.