Overview
- Google released Gemma 4 as an open‑weight model family under the Apache 2.0 license that permits commercial use and redistribution.
- The lineup includes 2B and 4B edge models, a 26B Mixture‑of‑Experts variant that activates about 3.8B parameters per token, and a 31B dense model.
- Hands‑on tests on Apple Silicon report roughly 85 tokens per second on an M3 Ultra using Rapid‑MLX with an OpenAI‑compatible local API and working tool calls.
- The models were trained for agent workflows with dedicated function‑calling tokens and a configurable thinking mode that supports step‑by‑step planning.
- Developers have run the 2B and 4B variants fully offline on iPhones via Google’s Edge AI Gallery, cutting round‑trip latency and keeping sensitive data on device.