ML-Master

Towards AI-for-AI via Integration of Exploration and Reasoning

ML-Master is an AI-for-AI agent that seamlessly integrates exploration and reasoning by an adaptive memory mechanism to automate machine learning development, ranking #1 on MLE-Bench.

29.3% Medal Rate
2x Faster
93.3% Valid Submissions

Revolutionary AI4AI Agent

ML-Master seamlessly integrates exploration and reasoning through adaptive memory mechanisms

🔍

Multi-trajectory Exploration

MCTS-inspired parallel exploration that efficiently navigates solution spaces while maintaining optimal balance between exploitation and exploration.

🧠

Steerable Reasoning

Enhanced reasoning capabilities with adaptive memory integration, reducing hallucinations and improving reliability through contextual grounding.

Adaptive Memory

Selectively captures and summarizes insights from exploration trajectories, enabling continuous learning without overwhelming the reasoning process.

State-of-the-Art Performance

ML-Master achieves superior results across all difficulty levels on MLE-Bench

Medal Rate Comparison Across Task Complexities (%)

Medal Rate (%)
50
40
30
20
10
0
Task Complexity Levels
Low
Medium
High
Average
OpenHands
AIDE-r1
AIDE-o1-preview
R&D-Agent
ML-Master

ML-Master excels across all complexity levels

🎯 Particularly strong in Medium (20.2) and High (24.4) complexity tasks

⚡ More than doubles previous best results in medium-difficulty challenges

Detailed Results on MLE-Bench

Agent Valid
Submission(%)
Above
Median(%)
Bronze
(%)
Silver
(%)
Gold
(%)
Any
Medal(%)
MLAB
gpt-4o-2024-08-06 44.3 ± 2.6 1.9 ± 0.7 0.0 ± 0.0 0.0 ± 0.0 0.8 ± 0.5 0.8 ± 0.5
OpenHands
gpt-4o-2024-08-06 52.0 ± 3.3 7.1 ± 1.7 0.4 ± 0.4 1.3 ± 0.8 2.7 ± 1.1 4.4 ± 1.4
AIDE
gpt-4o-2024-08-06 54.9 ± 1.0 14.4 ± 0.7 1.6 ± 0.2 2.2 ± 0.3 5.0 ± 0.4 8.7 ± 0.5
o1-preview 82.8 ± 1.1 29.4 ± 1.3 3.4 ± 0.5 4.1 ± 0.6 9.4 ± 0.8 16.9 ± 1.1
Deepseek-R1* 78.6 ± 0.0 34.6 ± 0.0 2.7 ± 0.0 4.0 ± 0.0 8.0 ± 0.0 14.7 ± 0.0
R&D-Agent
o1-preview 86.1 ± 1.1 32.8 ± 1.2 3.5 ± 0.5 4.5 ± 0.5 14.4 ± 0.5 22.4 ± 0.5
ML-Master
Deepseek-R1 93.3 ± 1.3 44.9 ± 1.2 4.4 ± 0.9 7.6 ± 0.4 17.3 ± 0.8 29.3 ± 0.8

* Single run due to resource constraints

Live Demo

Watch ML-Master solve Kaggle competitions in real-time

Choose a Competition:

ML-Master: Titanic Survival Prediction
×
Node Info