02 Jun 2025 2 min read

ProxyThinker: How Small AI Models Can Supercharge Big Ones Without Extra Training

In the rapidly evolving world of AI, large vision-language models (LVLMs) are becoming increasingly powerful—but also increasingly expensive to train. A new paper titled ProxyThinker introduces a clever workaround: using small, specialized AI models to guide larger ones at inference time, bypassing the need for costly reinforcement learning fine-tuning (RFT). Here’s how it works—and why it matters for businesses leveraging AI.

The Problem: Scaling AI Is Expensive

Training LVLMs with reinforcement learning is computationally intensive. Techniques like reinforcement fine-tuning (RFT) can improve a model’s reasoning abilities, but they require maintaining multiple model copies and alternating between rollout and optimization phases. This makes scaling beyond 7 billion parameters prohibitively expensive for many organizations.

The Solution: Borrowing Brains from Smaller Models

ProxyThinker, proposed by researchers from Rice University, UIUC, and the University of Virginia, offers a way to transfer reasoning skills from small, RFT-trained models to larger base models—without any additional training. The key insight? The difference in output distributions between a small RFT-trained model and its untrained counterpart can be used to steer a larger model’s behavior at inference time.

Here’s the technical breakdown:

Three Models in Play:

A large base model (e.g., Qwen2.5-VL-32B).
A small amateur model (untrained).
A small expert model (RFT-trained).

Logit Adjustment: During inference, ProxyThinker adjusts the base model’s output by adding a scaled difference between the expert and amateur models’ logits. This nudges the larger model toward the expert’s reasoning style.
Efficient Parallelism: The system optimizes GPU usage by running models asynchronously, achieving up to 38× faster inference compared to naive implementations.

Why This Matters for Business

Cost Savings: Companies can enhance their existing large models without expensive RFT training.
Scalability: ProxyThinker works with models up to 72B parameters, making it viable for enterprise-scale deployments.
Performance Gains: In tests, ProxyThinker improved accuracy on benchmarks like MathVision (from 38.4% to 40.8%) and MathVerse (from 53.8% to 57.2%), rivaling full-scale RFT models.

Limitations and Future Work

Dependency on Small Experts: The method requires access to a high-quality RFT-trained small model.
Knowledge-Intensive Tasks: Gains are less pronounced on benchmarks requiring deep factual knowledge (e.g., MMMU).

The Bottom Line

ProxyThinker is a promising step toward more efficient AI scaling. By leveraging small models to guide larger ones, businesses can unlock advanced reasoning capabilities without the hefty training costs. As AI continues to permeate industries, techniques like this could be key to making cutting-edge models more accessible.

For more details, check out the full paper on arXiv.