PaperBench: Evaluating AI’s Ability to Replicate AI Research

OpenAI Blog
April 2, 2025
We introduce PaperBench, a benchmark evaluating the ability of AI agents to replicate state-of-the-art AI research.
Verticals
airesearch
Originally published on OpenAI Blog on 4/2/2025