13 Jun 2025 2 min read

AUTOMIND: The Next Leap in Automated Data Science with Adaptive LLM Agents

The Promise and Pitfalls of LLM-Driven Data Science

Large Language Models (LLMs) have revolutionized how we approach data science, promising to automate everything from data preprocessing to model deployment. But despite their potential, existing frameworks often fall short when faced with real-world complexity. Rigid workflows and inflexible coding strategies limit their effectiveness, leaving them unable to match the nuanced expertise of human practitioners.

Enter AUTOMIND, a new framework from researchers at Zhejiang University and Ant Group that aims to bridge this gap. By combining an expert knowledge base, a novel tree search algorithm, and a self-adaptive coding strategy, AUTOMIND outperforms current state-of-the-art solutions—even surpassing human performance in some cases.

What Makes AUTOMIND Different?

1. Expert Knowledge Base: Grounding LLMs in Domain Expertise

One of AUTOMIND’s key innovations is its curated knowledge base, which includes:

Top Kaggle solutions (3,237 forum posts from 455 competitions)
Peer-reviewed papers from conferences like NeurIPS, ICML, and KDD

This repository allows AUTOMIND to retrieve and apply domain-specific tricks and cutting-edge techniques dynamically, rather than relying solely on the LLM’s pre-trained knowledge.

2. Agentic Knowledgeable Tree Search: Smarter Exploration

Instead of brute-forcing solutions, AUTOMIND uses a tree search algorithm to explore possible approaches strategically. Each node in the tree represents a potential solution (plan + code + validation metric), and the agent iteratively refines these solutions through:

Drafting: Generating initial plans using retrieved knowledge.
Debugging: Fixing errors in failed solutions.
Improving: Enhancing valid solutions with new tricks.

This method ensures that the agent doesn’t get stuck in local optima and can adapt to task complexity.

3. Self-Adaptive Coding: Dynamic Code Generation

Not all tasks are created equal. AUTOMIND dynamically adjusts its coding strategy based on complexity:

One-pass generation for simple tasks (e.g., basic feature engineering).
Stepwise decomposition for complex tasks (e.g., multi-stage neural architectures), where each substep is verified before proceeding.

This flexibility reduces error accumulation and improves efficiency.

Performance: Beating Humans and SOTA Models

AUTOMIND was evaluated on two benchmarks:

MLE-Bench (Lite Version)

Results: AUTOMIND (using deepseek-v3) outperformed 56.8% of human participants—a 13.5% improvement over the previous SOTA (AIDE).
Efficiency: Achieved AIDE’s 24-hour performance in just 6 hours, with 63% lower token costs.

Top AI Competitions (BELKA & OAG)

BELKA Challenge: AUTOMIND achieved an average precision of 0.44, a 0.35 absolute gain over AIDE.
OAG Challenge: Improved AUC by 0.06.

Why This Matters

AUTOMIND isn’t just another incremental improvement—it’s a paradigm shift in how LLM agents approach data science:

Human-like adaptability: By integrating expert knowledge, it mimics the iterative, creative problem-solving of human practitioners.
Scalability: The tree search and adaptive coding make it efficient even for cutting-edge tasks.
Real-world readiness: Unlike rigid pipelines, AUTOMIND can handle the diversity of Kaggle competitions and research challenges.

Limitations and Future Work

Benchmark constraints: Due to computational limits, only 16 of 75 MLE-Bench tasks were tested.
Coding capability dependence: Performance hinges on the underlying LLM’s coding skills.

Future iterations could explore fine-tuning for specific domains or integrating reinforcement learning for even better exploration.

The Bottom Line

AUTOMIND represents a major step toward fully automated data science. By combining expert knowledge, strategic exploration, and adaptive coding, it sets a new standard for what LLM agents can achieve—and hints at a future where AI doesn’t just assist humans but rivals their expertise.

For more details, check out the full paper on arXiv and the GitHub repository.