Particle.news
Download on the App Store

AI Produces Passing Solutions for Seven of Ten Unpublished Math Problems, Harvard Panel Finds

The result shows models can reach research-grade solutions, exposing urgent needs for raw-reasoning disclosure, formal proof checks, clearer verification standards

Overview

  • A First Proof team convened at Harvard in early June to blind-grade AI-generated solutions to ten original, unpublished research problems and found passing answers for seven of them.
  • Organizers used unpublished questions to prevent training-data leakage so the test measured genuine problem-solving rather than memorized solutions.
  • Submitted solutions varied: some were flawless, some needed small human fixes, and some were incorrect, showing uneven reliability in model outputs.
  • Judges said later success often relied on multiple attempts, advanced prompting or surrounding software tools called "AI harnesses," which helped models extend or check steps.
  • The results add to recent high-profile claims like OpenAI’s reported disproof of an Erdős conjecture and have prompted calls for full reasoning disclosure, independent peer review, and wider use of formal proof assistants such as Lean.