mPLUG/GUI-Owl-1.5-8B-Instruct

VISIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:32kPublished:Feb 14, 2026License:mitArchitecture:Transformer0.0K Open Weights Cold

GUI-Owl 1.5-8B-Instruct, developed by mPLUG, is an 8 billion parameter instruction-tuned model built on Qwen3-VL, designed for native multi-platform GUI automation across desktops, mobile devices, and browsers. It excels in GUI agent tasks, supporting external tool invocation, multi-platform environment reinforcement learning, and long-horizon memory capabilities. This model is optimized for fast inference and edge deployment in complex GUI interaction scenarios.

Loading preview...

GUI-Owl 1.5-8B-Instruct Overview

GUI-Owl 1.5 is a family of native GUI agent models, with the 8B-Instruct variant being an instruction-tuned model built upon Qwen3-VL. It is specifically designed for multi-platform GUI automation, covering desktops, mobile devices, and browsers. The model leverages a scalable hybrid data flywheel, unified agent capability enhancements, and multi-platform environment RL (MRPO) to achieve its performance.

Key Capabilities & Features

  • Multi-platform GUI Automation: Supports interaction across various GUI environments including desktops, mobile devices, and web browsers.
  • Tool & MCP Calling: Natively integrates external tool invocation and coordination with MCP servers, demonstrating strong performance on benchmarks like OSWorld-MCP and Mobile-World.
  • Long-horizon Memory: Features built-in memory capabilities for complex tasks without requiring external workflow orchestration, leading on MemGUI-Bench.
  • Multi-agent Ready: Can function as a standalone end-to-end agent or as specialized roles (planner, executor, verifier, notetaker) within the Mobile-Agent-v3.5 framework.
  • Performance: Achieves competitive results on end-to-end online benchmarks such as OSWorld-Verified (52.3), AndroidWorld (69.0), and WebArena (45.7).

Ideal Use Cases

  • Developing automated GUI agents for diverse platforms.
  • Applications requiring external tool invocation and complex task execution within GUI environments.
  • Scenarios demanding long-term memory for sequential GUI interactions.
  • Integration into multi-agent systems where specialized roles are needed for GUI tasks.