26 Jun 2025 3 min read

CoMind: The AI Agent That’s Beating Humans at Kaggle Competitions

The Rise of Community-Driven AI Agents in Machine Learning

Imagine an AI that doesn’t just solve machine learning problems in isolation but actively participates in research communities, shares insights, and iteratively improves solutions based on collective knowledge. That’s exactly what CoMind, a new AI agent developed by researchers from Carnegie Mellon University and Peking University, is doing—and it’s outperforming 79.2% of human competitors in live Kaggle competitions.

The Problem with Isolated AI Agents

Most AI agents today operate in a vacuum. They’re given a problem, they churn through possible solutions, and they spit out an answer. But real-world research—especially in fields like machine learning—isn’t done in isolation. Human researchers collaborate, share ideas, and build on each other’s work. Current AI agents miss out on this critical aspect of scientific progress, often plateauing in performance because they can’t tap into the broader community’s knowledge.

Enter MLE-Live, a new evaluation framework designed to simulate a Kaggle-style research community. Unlike traditional benchmarks, MLE-Live includes time-stamped discussions, shared code, and other resources that human competitors would normally access. This setup allows researchers to test how well AI agents can leverage collective intelligence—just like humans do.

How CoMind Works

CoMind is built to thrive in this community-driven environment. It operates in four iterative stages:

Idea Selection: The agent scans a pool of curated ideas from past solutions, discussions, and public code, ranking them based on relevance and performance.
Idea Generation: Using the selected ideas, CoMind drafts high-level solutions, ensuring diversity and avoiding simple replication of existing methods.
Implementation and Improvement: The agent enters a ReAct-style loop, writing code, testing it, and refining based on feedback—all within a constrained runtime environment.
Report Generation: Finally, CoMind compiles a detailed report of its solution, including performance metrics and limitations, and shares it back into the community pool for future iterations.

What sets CoMind apart is its ability to run multiple agents in parallel, each contributing to and learning from a shared knowledge base. This mimics how human teams collaborate, with each member building on others’ work.

Real-World Performance

In tests across 20 past Kaggle competitions, CoMind achieved a 66.8% average win rate, meaning it outperformed two-thirds of human participants. It also earned nine medals (five gold)—a 125% improvement over previous state-of-the-art agents like AIDE.

But the real test came when researchers deployed CoMind in four ongoing Kaggle competitions, including a CVPR-affiliated challenge on marine biodiversity. The results? CoMind ranked:

#4 out of 48 in forams-classification-2025
#15 out of 47 in fathomnet-2025
#120 out of 2,338 in playground-series-s5e5
#128 out of 333 in el-hackathon-2025

Why This Matters

CoMind isn’t just another AI tool—it’s a glimpse into the future of collaborative AI research. By integrating community knowledge, it avoids the pitfalls of isolated agents, which often get stuck in repetitive strategies. It also generates longer, more complex code than its predecessors, suggesting deeper reasoning and better integration of novel ideas.

But there are challenges. CoMind’s reliance on community resources means it might struggle in domains with limited public data. And while it excels at iterative improvement, its initial solutions can take longer to develop compared to simpler agents like AIDE.

The Future of AI in Research

The success of CoMind and MLE-Live opens the door for AI agents to contribute meaningfully to scientific discovery. Future work could expand the agent’s capabilities to include commenting, asking questions, or even sharing datasets—making it an even more active participant in research communities.

For now, though, one thing is clear: AI isn’t just automating machine learning—it’s learning to collaborate like humans do. And if CoMind’s performance is any indication, it’s getting scarily good at it.