Our First Proof submissions

OpenAI Blog

February 20, 2026

AI-Generated Deep Dive Summary

OpenAI has shared its proof attempts for the First Proof math challenge, a rigorous test designed to evaluate whether AI can produce correct, checkable proofs on complex, domain-specific problems. The challenge, created by leading experts, includes problems that have remained unsolved for years. OpenAI’s internal model was put to the test across all 10 problems, with at least five of its attempts deemed potentially correct based on expert feedback. While some proofs were refined or found to be incorrect upon further review, the effort highlights AI's growing capability in sustained, research-grade reasoning. The experiment involved limited human oversight, where the model was prompted to expand or clarify parts of proofs after receiving feedback. In one notable instance, the model solved problem #9 and later攻克了#4, demonstrating its ability to improve over time. This iterative process showcased the potential for AI models to tackle increasingly complex problems through continuous learning and refinement. The significance of this work lies in its contribution to advancing AI's reasoning capabilities, particularly in areas requiring long chains of logical thought and domain-specific expertise. While traditional benchmarks often overlook these aspects, challenges like First Proof provide valuable insights into AI’s ability to handle ambiguity, choose appropriate abstractions, and produce arguments that withstand expert scrutiny. OpenAI’s approach also underscores the importance of developing frameworks for rigorous evaluation in future iterations. By engaging with experts and iterating on their process, they aim to create a more robust methodology for testing AI’s research capabilities. This not only benefits AI development but also opens new possibilities for collaboration between AI systems and human experts in solving complex mathematical and scientific problems. The project builds on earlier successes, such as OpenAI achieving gold medal-level performance in the International Mathematical Olympiad. These milestones demonstrate that AI is making strides toward tackling frontier challenges in math and science, offering a glimpse into its potential for driving innovation across these fields.

Verticals

airesearch

Originally published on OpenAI Blog on 2/20/2026