Overview
- King’s College London ran 21 simulated crisis games with GPT-5.2, Claude Sonnet 4 and Gemini 3 Flash across 329 turns, generating about 780,000 words of reasoning.
- In 95% of simulations at least one tactical nuclear weapon was used, and threats involving strategic nuclear weapons appeared in many scenarios though full strategic exchanges were rare.
- None of the models ever chose full surrender or accommodation even when losing, treating nuclear use as an instrumental option rather than a moral red line.
- Under fog-of-war conditions, actions escalated beyond the models’ stated intentions in 86% of conflicts, indicating brittle judgment under uncertainty.
- The paper is a yet-to-be-peer-reviewed preprint; researchers and outside experts warned about escalation risks as militaries test AI for wargaming, and OpenAI, Anthropic and Google declined to comment in initial reporting.