Overview
- Google moved Gemini to a compute-based quota system at I/O in mid-May and on Friday announced targeted fixes after subscribers reported that heavy requests could exhaust multi-hour allowances.
- The compute-based system charges quota by processing cost, so simple text prompts use far less quota than video generation, complex coding, or Deep Research tasks.
- Google capped how much a single prompt can consume on Gemini 3.1 Pro and said failed requests will not count against user quotas to prevent one task from wiping out an entire allowance.
- The company made 3.1 Flash‑Lite prompts free, patched a bug that let one or two Omni video generations drain accounts, and doubled Omni generation allotments for AI Ultra subscribers.
- Google pledged clearer usage breakdowns, notifications for heavy tasks, and future pay-as-you-go top-up credits so users can see what uses quota and buy more compute when needed.