Overview
- The agent iteratively generates, verifies, and revises natural‑language proofs using an advanced Gemini Deep Think variant, a novel inference‑time scaling law, and integrated tools.
- The authors cite three milestones: an AI‑only paper on eigenweights (Feng26), a human–AI collaboration on bounds for independent sets (LeeSeo26), and a semi‑autonomous run on 700 Erdős problems with four autonomous solutions.
- Secondary reports state Aletheia scored 91.9% on the IMO‑Proofbench Advanced benchmark, surpassing the standalone advanced Gemini Deep Think while using less compute.
- The announcement builds on prior Gemini results reaching International Mathematical Olympiad gold‑medal competence, shifting focus from contest problems to longer‑horizon research proofs.
- The paper proposes standards to label autonomy and novelty in AI‑assisted mathematics, and researchers highlight recurring risks including confirmation bias, technical hallucinations, and alignment‑related friction.