Particle.news
Download on the App Store

Gemma 4 Goes Local With Tool Calling on Phones and Apple Silicon

The Apache 2.0 release enables private, low-cost agents on consumer hardware.

Overview

  • Google released Gemma 4 as an open‑weight model family under the Apache 2.0 license that permits commercial use and redistribution.
  • The lineup includes 2B and 4B edge models, a 26B Mixture‑of‑Experts variant that activates about 3.8B parameters per token, and a 31B dense model.
  • Hands‑on tests on Apple Silicon report roughly 85 tokens per second on an M3 Ultra using Rapid‑MLX with an OpenAI‑compatible local API and working tool calls.
  • The models were trained for agent workflows with dedicated function‑calling tokens and a configurable thinking mode that supports step‑by‑step planning.
  • Developers have run the 2B and 4B variants fully offline on iPhones via Google’s Edge AI Gallery, cutting round‑trip latency and keeping sensitive data on device.