Introducing SWE-bench Verified
OpenAI Blog
August 13, 2024
We’re releasing a human-validated subset of SWE-bench that more reliably evaluates AI models’ ability to solve real-world software issues.
Verticals
airesearch
Originally published on OpenAI Blog on 8/13/2024