Overview
- Developers can access the model now through Microsoft Foundry and Hugging Face with sample notebooks, API examples, a model card, and a technical paper.
- The system offers three selectable thinking modes—hybrid, think, and nothink—letting teams balance deeper reasoning against lower latency during runtime.
- Targeted uses include computer‑use agents that interpret UI screens and return actionable coordinates, as well as visual math, science, and document, chart, or table understanding.
- Microsoft published internal benchmark results spanning multimodal reasoning, mathematics, and GUI tasks, presented as comparative analysis rather than formal leaderboard claims.
- Safety alignment draws on public safety datasets and internally generated refusal examples under Microsoft’s Responsible AI principles, with documented limitations and deployment guidance.