MarsXL/UI-Voyager
UI-Voyager is a 4 billion parameter self-evolving mobile GUI agent developed by Zichuan Lin et al., fine-tuned from Qwen3-VL-4B-Instruct with a 32768 token context length. This model specializes in autonomously operating mobile device interfaces, recognizing UI elements, and completing tasks. It achieves an 81.0% success rate on the AndroidWorld benchmark, surpassing human-level performance through a two-stage training paradigm that learns from failed experiences.
Loading preview...
UI-Voyager: A Self-Evolving Mobile GUI Agent
UI-Voyager is a 4 billion parameter mobile GUI agent, fine-tuned from the powerful Qwen3-VL-4B-Instruct model. Developed by Zichuan Lin et al., this model is designed to autonomously interact with mobile device interfaces, understanding UI elements and executing tasks. Its core innovation lies in a two-stage self-evolving training paradigm, allowing it to continuously improve by learning from failed experiences.
Key Capabilities
- State-of-the-Art Performance: Achieves an 81.0% success rate on the challenging AndroidWorld benchmark, outperforming many recent baselines and exceeding human-level performance.
- Self-Evolving Learning: Utilizes a unique training approach that enables the agent to learn and adapt from its past failures, leading to continuous improvement.
- Mobile GUI Automation: Specialized in operating mobile UIs, including visual perception, OCR, and multimodal reasoning to interpret and interact with screen elements.
- Strong Foundation: Leverages the robust vision-language capabilities of its base model, Qwen3-VL-4B-Instruct, for advanced visual understanding.
When to Use UI-Voyager
- Automating Mobile Tasks: Ideal for scenarios requiring autonomous interaction with mobile applications and interfaces.
- Mobile UI Testing: Can be used to simulate user interactions and test application functionality on Android devices.
- Research in GUI Agents: Provides a strong baseline and innovative architecture for further research into self-improving AI agents for graphical user interfaces.