Particle News: Gemma 4 Goes Local With Tool Calling on Phones and Apple Silicon

Overview

Google released Gemma 4 as an open‑weight model family under the Apache 2.0 license that permits commercial use and redistribution.
The lineup includes 2B and 4B edge models, a 26B Mixture‑of‑Experts variant that activates about 3.8B parameters per token, and a 31B dense model.
Hands‑on tests on Apple Silicon report roughly 85 tokens per second on an M3 Ultra using Rapid‑MLX with an OpenAI‑compatible local API and working tool calls.
The models were trained for agent workflows with dedicated function‑calling tokens and a configurable thinking mode that supports step‑by‑step planning.
Developers have run the 2B and 4B variants fully offline on iPhones via Google’s Edge AI Gallery, cutting round‑trip latency and keeping sensitive data on device.