mPLUG/GUI-Owl-1.5-2B-Instruct
GUI-Owl 1.5-2B-Instruct, developed by mPLUG, is a 2 billion parameter instruction-tuned model built on Qwen3-VL, designed for multi-platform GUI automation. It excels as a native GUI agent across desktops, mobile devices, and browsers, supporting tool invocation and long-horizon memory. This model is optimized for fast inference and edge deployment in GUI agent applications, achieving strong performance on benchmarks like OSWorld-Verified and AndroidWorld.
Loading preview...
GUI-Owl 1.5-2B-Instruct: Multi-Platform GUI Agent
GUI-Owl 1.5-2B-Instruct is a 2 billion parameter model from the GUI-Owl 1.5 family, built upon Qwen3-VL, specifically designed for native GUI automation across diverse platforms including desktops, mobile devices, and browsers. It leverages a hybrid data flywheel, unified agent capability enhancements, and multi-platform environment RL (MRPO) to deliver robust performance.
Key Capabilities
- Multi-Platform GUI Automation: Supports automation across various operating systems and environments.
- Tool & MCP Calling: Natively integrates external tool invocation and Multi-platform Coordination Protocol (MCP) server coordination.
- Long-Horizon Memory: Features built-in memory capabilities, eliminating the need for external workflow orchestration for complex tasks.
- Multi-Agent Ready: Can function as a standalone end-to-end agent or as specialized roles (planner, executor, verifier, notetaker) within the Mobile-Agent-v3.5 framework.
- Optimized for Inference: As an 'Instruct' variant, it is designed for fast inference and suitability for edge deployments.
Performance Highlights
This model demonstrates strong performance on various end-to-end online benchmarks, including:
- OSWorld-Verified: Achieves 43.5
- AndroidWorld: Achieves 67.9
- OSWorld-MCP: Achieves 33.0
- Mobile-World: Achieves 31.3
- WindowsAA: Achieves 25.8
Good For
- Developing native GUI automation solutions for desktop, mobile, and web applications.
- Applications requiring efficient, instruction-tuned agents for GUI interaction.
- Edge deployment scenarios where fast inference is critical for GUI automation tasks.