MarsXL/UI-Voyager

VISIONConcurrency Cost:1Model Size:4BQuant:BF16Ctx Length:32kPublished:Mar 25, 2026License:mitArchitecture:Transformer0.0K Open Weights Cold

UI-Voyager is a 4 billion parameter self-evolving mobile GUI agent developed by Zichuan Lin et al., fine-tuned from Qwen3-VL-4B-Instruct with a 32768 token context length. This model specializes in autonomously operating mobile device interfaces, recognizing UI elements, and completing tasks. It achieves an 81.0% success rate on the AndroidWorld benchmark, surpassing human-level performance through a two-stage training paradigm that learns from failed experiences.

Loading preview...

UI-Voyager: A Self-Evolving Mobile GUI Agent

UI-Voyager is a 4 billion parameter mobile GUI agent, fine-tuned from the powerful Qwen3-VL-4B-Instruct model. Developed by Zichuan Lin et al., this model is designed to autonomously interact with mobile device interfaces, understanding UI elements and executing tasks. Its core innovation lies in a two-stage self-evolving training paradigm, allowing it to continuously improve by learning from failed experiences.

Key Capabilities

  • State-of-the-Art Performance: Achieves an 81.0% success rate on the challenging AndroidWorld benchmark, outperforming many recent baselines and exceeding human-level performance.
  • Self-Evolving Learning: Utilizes a unique training approach that enables the agent to learn and adapt from its past failures, leading to continuous improvement.
  • Mobile GUI Automation: Specialized in operating mobile UIs, including visual perception, OCR, and multimodal reasoning to interpret and interact with screen elements.
  • Strong Foundation: Leverages the robust vision-language capabilities of its base model, Qwen3-VL-4B-Instruct, for advanced visual understanding.

When to Use UI-Voyager

  • Automating Mobile Tasks: Ideal for scenarios requiring autonomous interaction with mobile applications and interfaces.
  • Mobile UI Testing: Can be used to simulate user interactions and test application functionality on Android devices.
  • Research in GUI Agents: Provides a strong baseline and innovative architecture for further research into self-improving AI agents for graphical user interfaces.