mPLUG/GUI-Owl-1.5-8B-Instruct
GUI-Owl 1.5-8B-Instruct, developed by mPLUG, is an 8 billion parameter instruction-tuned model built on Qwen3-VL, designed for native multi-platform GUI automation across desktops, mobile devices, and browsers. It excels in GUI agent tasks, supporting external tool invocation, multi-platform environment reinforcement learning, and long-horizon memory capabilities. This model is optimized for fast inference and edge deployment in complex GUI interaction scenarios.
Loading preview...
GUI-Owl 1.5-8B-Instruct Overview
GUI-Owl 1.5 is a family of native GUI agent models, with the 8B-Instruct variant being an instruction-tuned model built upon Qwen3-VL. It is specifically designed for multi-platform GUI automation, covering desktops, mobile devices, and browsers. The model leverages a scalable hybrid data flywheel, unified agent capability enhancements, and multi-platform environment RL (MRPO) to achieve its performance.
Key Capabilities & Features
- Multi-platform GUI Automation: Supports interaction across various GUI environments including desktops, mobile devices, and web browsers.
- Tool & MCP Calling: Natively integrates external tool invocation and coordination with MCP servers, demonstrating strong performance on benchmarks like OSWorld-MCP and Mobile-World.
- Long-horizon Memory: Features built-in memory capabilities for complex tasks without requiring external workflow orchestration, leading on MemGUI-Bench.
- Multi-agent Ready: Can function as a standalone end-to-end agent or as specialized roles (planner, executor, verifier, notetaker) within the Mobile-Agent-v3.5 framework.
- Performance: Achieves competitive results on end-to-end online benchmarks such as OSWorld-Verified (52.3), AndroidWorld (69.0), and WebArena (45.7).
Ideal Use Cases
- Developing automated GUI agents for diverse platforms.
- Applications requiring external tool invocation and complex task execution within GUI environments.
- Scenarios demanding long-term memory for sequential GUI interactions.
- Integration into multi-agent systems where specialized roles are needed for GUI tasks.