Overview
- Google DeepMind released Gemma 4 as four open‑weight models under Apache 2.0, removing prior limits on commercial use.
- Smaller 4B and 2B variants target laptops and edge devices, while the 31B model runs on a single 80GB Nvidia H100 and fits on 24GB GPUs when quantized.
- All models support images and video plus native function calling and JSON output, and the 31B posts top‑tier results on Arena AI and AIME 2026.
- Day‑zero support spans Nvidia and AMD hardware and popular tools like Ollama, LM Studio and Hugging Face, which lowers friction for local setup.
- Early testers report slower throughput on the 26B mixture‑of‑experts model and uneven fine‑tuning compatibility, which could delay production rollouts.