Overview
- Multiple reports say Apple plans a Gemini-infused Siri that runs smaller models on iPhones and uses larger Gemini instances in the cloud for harder requests.
- Apple is reportedly distilling Google's large Gemini models into much smaller, quantized variants so they can run within iPhone memory and Neural Engine limits.
- For the heaviest inference, Apple would route queries to Google Cloud running on Nvidia hardware and use confidential computing to protect data during processing.
- Apple is said to be exploring acquisitions, including talks with Liquid AI, to speed development of on-device model techniques and close performance gaps.
- The change marks a shift from Apple’s prior local-only privacy pitch and could alter Siri’s speed, accuracy, and how Apple explains data handling at WWDC 2026.