Particle News: Ollama Previews MLX-Powered Apple Silicon Build With Big Speed Gains

Overview

Ollama 0.19 arrived in preview for macOS with a new Apple MLX–based build for Apple Silicon.
The release is tuned for Alibaba’s Qwen3.5‑35B‑A3B coding model and targets Macs with more than 32 GB of unified memory.
On M5‑series chips, the build uses new GPU Neural Accelerators to cut time to first token and raise generation speed.
Company tests reported about 1,810 prefill tokens per second and 112 decode tokens per second, up from 1,154 and 58 on version 0.18.
NVFP4 support and upgraded caching aim to keep outputs accurate while reducing memory use and speeding responses in coding agents.