2 min read

Two Experts Are All You Need: How Reinforcing Cognitive Effort in MoE Models Boosts Reasoning Without Extra Training

Two Experts Are All You Need: How Reinforcing Cognitive Effort in MoE Models Boosts Reasoning Without Extra Training

The Hidden Power of Cognitive Experts in MoE Models

Large Reasoning Models (LRMs) like OpenAI’s o1 and DeepSeek-R1 have pushed the boundaries of AI reasoning, but they still struggle with inefficiencies—overthinking, underthinking, and inconsistent reasoning. A new paper from researchers at Tencent and Zhejiang University introduces Reinforcing Cognitive Experts (RICE), a lightweight method to enhance reasoning in Mixture-of-Experts (MoE) models without additional training.

The Problem: Cognitive Inefficiency in MoE Models

MoE architectures, used in models like DeepSeek-R1 and Qwen3-235B, activate only a subset of experts per input, making them computationally efficient. But even these models can get stuck in unproductive reasoning loops or fail to engage deeper thought when needed. Traditional fixes—like prompt engineering or decoding constraints—help but don’t address the root issue: some experts are simply better at reasoning than others.

The Solution: Finding and Boosting Cognitive Experts

The researchers hypothesized that certain experts in MoE models specialize in meta-level reasoning—handling tokens like <think> that signal deep deliberation. Using normalized Pointwise Mutual Information (nPMI), they identified these cognitive experts by measuring how often they activate alongside reasoning-related tokens.

Surprisingly, just two experts were often responsible for most reasoning improvements. By amplifying their influence during inference (without retraining), the team saw:

  • Accuracy boosts: Up to +10% on math and science benchmarks like AIME and GPQA Diamond.
  • Efficiency gains: Fewer tokens and thoughts needed to solve problems.
  • Generalization: Experts identified in one domain (e.g., math) improved performance in others (e.g., physics).

Why This Matters

  1. No Training Required – Unlike fine-tuning or preference optimization, RICE works at inference time, making it easy to deploy.
  • Interpretable Steering – Unlike black-box prompting tricks, RICE modifies known experts, offering a clearer path to debugging.
  • Scalable – Works on billion-parameter models without extra compute.

The Future of Expert Steering

This research suggests that MoE models already have specialized reasoning pathways—we just need to find and reinforce them. Future work could explore:

  • Cross-model cognitive experts: Do similar experts exist in different architectures?
  • Dynamic reinforcement: Adjusting expert weights based on problem difficulty.
  • Safety implications: Could adversarial attacks exploit these experts?

For now, RICE offers a simple but powerful way to make models think smarter, not harder—and it’s all hidden in the weights we already have.