mPLUG/GUI-Owl-1.5-4B-Instruct

VISIONConcurrency Cost:1Model Size:4BQuant:BF16Ctx Length:32kPublished:Feb 14, 2026License:mitArchitecture:Transformer0.0K Open Weights Cold

GUI-Owl 1.5 is a family of native GUI agent models developed by X-PLUG, built on Qwen3-VL, designed for multi-platform GUI automation. The mPLUG/GUI-Owl-1.5-4B-Instruct is a 4 billion parameter instruction-tuned variant, optimized for fast inference and edge deployment. It excels in multi-platform GUI automation across desktops, mobile devices, and browsers, demonstrating state-of-the-art performance on various GUI benchmarks like OSWorld-Verified and AndroidWorld. This model features native support for external tool invocation, MCP server coordination, and built-in long-horizon memory capabilities.

Loading preview...

GUI-Owl 1.5: Multi-Platform GUI Agent Model

mPLUG/GUI-Owl-1.5-4B-Instruct is part of the next-generation GUI-Owl 1.5 model family, developed by X-PLUG and based on Qwen3-VL. This 4 billion parameter instruction-tuned model is specifically designed for native GUI automation across diverse platforms, including desktops, mobile devices, and web browsers. It leverages a scalable hybrid data flywheel, unified agent capability enhancement, and multi-platform environment RL (MRPO) for robust performance.

Key Capabilities

  • Multi-Platform GUI Automation: Supports automation across various operating systems and environments.
  • Tool & MCP Calling: Natively integrates external tool invocation and MCP server coordination for enhanced functionality.
  • Long-Horizon Memory: Features built-in memory capabilities, outperforming other native agent models on MemGUI-Bench.
  • High Performance: Achieves state-of-the-art results on benchmarks such as OSWorld-Verified (48.2%), AndroidWorld (69.8%), and WindowsAA (29.4%).
  • Flexible Deployment: Available in both Instruct variants for fast inference and edge deployment, and larger Thinking variants for complex tasks requiring planning.

Good For

  • Developing automated GUI agents for desktop, mobile, and web applications.
  • Tasks requiring interaction with graphical user interfaces and external tools.
  • Applications needing long-term memory for sequential GUI operations.
  • Edge deployment scenarios where fast inference is critical.