MLE-bench: Evaluating Machine Learning Agents on Machine Learning Engineering

OpenAI Blog
October 10, 2024
We introduce MLE-bench, a benchmark for measuring how well AI agents perform at machine learning engineering.
Verticals
airesearch
Originally published on OpenAI Blog on 10/10/2024