Particle.news
Download on the App Store

Ollama Previews MLX-Powered Apple Silicon Build With Big Speed Gains

NVFP4 quantization aims to match cloud outputs on local machines.

Overview

  • Ollama 0.19 arrived in preview for macOS with a new Apple MLX–based build for Apple Silicon.
  • The release is tuned for Alibaba’s Qwen3.5‑35B‑A3B coding model and targets Macs with more than 32 GB of unified memory.
  • On M5‑series chips, the build uses new GPU Neural Accelerators to cut time to first token and raise generation speed.
  • Company tests reported about 1,810 prefill tokens per second and 112 decode tokens per second, up from 1,154 and 58 on version 0.18.
  • NVFP4 support and upgraded caching aim to keep outputs accurate while reducing memory use and speeding responses in coding agents.