Overview
- Google published the four-model Gemma 4 family Thursday under the Apache 2.0 license, enabling free commercial use, modification and redistribution.
- Sizes span E2B, E4B, a 26B Mixture of Experts that activates about 3.8B parameters at inference, and a 31B dense model whose bfloat16 weights fit on a single 80GB NVIDIA H100.
- The E2B and E4B run fully offline with near-zero latency on phones, Raspberry Pi and NVIDIA Jetson after engineering with Google’s Pixel team, Qualcomm and MediaTek.
- Capabilities include multi-step reasoning, native function calling for agents, offline code generation, image and video understanding with audio on the edge models, long contexts up to 256K tokens and training across 140+ languages.
- Google says the 31B ranks #3 and the 26B #6 on Arena AI’s open-model leaderboard, and the weights are available now via Google AI Studio, AI Edge Gallery, Hugging Face, Kaggle and Ollama.