Overview
- Ollama 0.19 arrived in preview for macOS with a new Apple MLX–based build for Apple Silicon.
- The release is tuned for Alibaba’s Qwen3.5‑35B‑A3B coding model and targets Macs with more than 32 GB of unified memory.
- On M5‑series chips, the build uses new GPU Neural Accelerators to cut time to first token and raise generation speed.
- Company tests reported about 1,810 prefill tokens per second and 112 decode tokens per second, up from 1,154 and 58 on version 0.18.
- NVFP4 support and upgraded caching aim to keep outputs accurate while reducing memory use and speeding responses in coding agents.