EcoAgent: A New Edge-Cloud Framework for Faster, Cheaper Mobile Automation
Mobile automation is getting smarter, but it’s still held back by a fundamental trade-off: cloud-based AI agents are powerful but slow and expensive, while edge-based agents are fast and cheap but struggle with complex tasks. A new paper from researchers at Zhejiang University, Hong Kong Polytechnic University, and Shanghai Jiaotong University proposes a solution: EcoAgent, an edge-cloud collaborative framework that combines the best of both worlds.
The Problem: Cloud vs. Edge
Cloud-based mobile agents, powered by multimodal large language models (MLLMs) like GPT-4o, excel at reasoning and long-term planning. But they suffer from high latency (since every decision requires a round-trip to the cloud) and high operational costs (due to token consumption). Meanwhile, edge-based agents, which use smaller, fine-tuned models (MSLMs), are fast and cost-effective but lack the cognitive horsepower to handle abstract reasoning or multi-step tasks.
The Solution: EcoAgent’s Three-Agent System
EcoAgent bridges this gap with a closed-loop collaboration between three specialized agents:
- Planning Agent (Cloud): Handles high-level reasoning, task decomposition, and adaptive replanning using an MLLM.
- Execution Agent (Edge): A lightweight MSLM that performs precise UI interactions (taps, swipes, text input).
- Observation Agent (Edge): Another MSLM that verifies whether actions succeeded by comparing screen states to expectations.
Key to this system are three modules:
- Pre-Understanding: Compresses screen images into concise text (reducing token usage by ~90%).
- Memory: Stores screen history to provide context for replanning.
- Reflection: Helps the Planning Agent adjust strategies when tasks fail.
Why It Works
By offloading repetitive, low-level actions to edge agents and reserving the cloud for high-level planning, EcoAgent achieves near-cloud performance at edge costs. In tests on AndroidWorld, it matched the success rate of cloud-only agents while reducing MLLM token consumption by 98% (from ~15,000 tokens per task to just ~3,200).
The Bigger Picture
EcoAgent is part of a growing trend toward hybrid AI systems that balance cloud intelligence with edge efficiency. As smartphones and edge devices get more powerful, frameworks like this could make AI-powered automation cheaper, faster, and more reliable for everyday use—whether it’s automating app workflows, handling customer support, or assisting users with disabilities.
Read the Full Paper
For more details, check out the preprint on arXiv.