Particle.news
Download on the App Store

Google Rolls Out Gemini 3.1 Flash‑Lite in Preview for High‑Volume AI Workloads

The release targets cost‑sensitive developer tasks with selectable thinking levels for tighter control over latency and reasoning.

Overview

  • Access begins in preview via the Gemini API in Google AI Studio and Vertex AI, with no availability in the consumer Gemini app.
  • Pricing is set at $0.25 per million input tokens and $1.50 per million output tokens, making it the most affordable model in the Gemini 3 lineup.
  • Google reports a 2.5× faster time to first token and 45% faster output versus Gemini 2.5 Flash, with generation speeds up to 363 tokens per second and coverage noting this is slightly slower than 2.5 Flash‑Lite but still faster than rivals.
  • Vendor‑cited benchmarks include an Arena.ai Elo score of 1432, plus 86.9% on GPQA Diamond and 76.8% on MMMU Pro for reasoning and multimodal tasks.
  • The model adds adjustable thinking levels in AI Studio and Vertex AI and is positioned for translation, content moderation, UI generation and simulations, with early testing from Latitude, Cartwheel and Whering.