mPLUG/GUI-Owl-1.5-4B-Instruct
GUI-Owl 1.5 is a family of native GUI agent models developed by X-PLUG, built on Qwen3-VL, designed for multi-platform GUI automation. The mPLUG/GUI-Owl-1.5-4B-Instruct is a 4 billion parameter instruction-tuned variant, optimized for fast inference and edge deployment. It excels in multi-platform GUI automation across desktops, mobile devices, and browsers, demonstrating state-of-the-art performance on various GUI benchmarks like OSWorld-Verified and AndroidWorld. This model features native support for external tool invocation, MCP server coordination, and built-in long-horizon memory capabilities.
Loading preview...
GUI-Owl 1.5: Multi-Platform GUI Agent Model
mPLUG/GUI-Owl-1.5-4B-Instruct is part of the next-generation GUI-Owl 1.5 model family, developed by X-PLUG and based on Qwen3-VL. This 4 billion parameter instruction-tuned model is specifically designed for native GUI automation across diverse platforms, including desktops, mobile devices, and web browsers. It leverages a scalable hybrid data flywheel, unified agent capability enhancement, and multi-platform environment RL (MRPO) for robust performance.
Key Capabilities
- Multi-Platform GUI Automation: Supports automation across various operating systems and environments.
- Tool & MCP Calling: Natively integrates external tool invocation and MCP server coordination for enhanced functionality.
- Long-Horizon Memory: Features built-in memory capabilities, outperforming other native agent models on MemGUI-Bench.
- High Performance: Achieves state-of-the-art results on benchmarks such as OSWorld-Verified (48.2%), AndroidWorld (69.8%), and WindowsAA (29.4%).
- Flexible Deployment: Available in both Instruct variants for fast inference and edge deployment, and larger Thinking variants for complex tasks requiring planning.
Good For
- Developing automated GUI agents for desktop, mobile, and web applications.
- Tasks requiring interaction with graphical user interfaces and external tools.
- Applications needing long-term memory for sequential GUI operations.
- Edge deployment scenarios where fast inference is critical.