GUI-Owl 1.5-2B-Instruct: Multi-Platform GUI Agent

GUI-Owl 1.5-2B-Instruct is a 2 billion parameter model from the GUI-Owl 1.5 family, built upon Qwen3-VL, specifically designed for native GUI automation across diverse platforms including desktops, mobile devices, and browsers. It leverages a hybrid data flywheel, unified agent capability enhancements, and multi-platform environment RL (MRPO) to deliver robust performance.

Key Capabilities

Multi-Platform GUI Automation: Supports automation across various operating systems and environments.
Tool & MCP Calling: Natively integrates external tool invocation and Multi-platform Coordination Protocol (MCP) server coordination.
Long-Horizon Memory: Features built-in memory capabilities, eliminating the need for external workflow orchestration for complex tasks.
Multi-Agent Ready: Can function as a standalone end-to-end agent or as specialized roles (planner, executor, verifier, notetaker) within the Mobile-Agent-v3.5 framework.
Optimized for Inference: As an 'Instruct' variant, it is designed for fast inference and suitability for edge deployments.

Performance Highlights

This model demonstrates strong performance on various end-to-end online benchmarks, including:

OSWorld-Verified: Achieves 43.5
AndroidWorld: Achieves 67.9
OSWorld-MCP: Achieves 33.0
Mobile-World: Achieves 31.3
WindowsAA: Achieves 25.8

Good For

Developing native GUI automation solutions for desktop, mobile, and web applications.
Applications requiring efficient, instruction-tuned agents for GUI interaction.
Edge deployment scenarios where fast inference is critical for GUI automation tasks.

Overview

GUI-Owl 1.5-2B-Instruct: Multi-Platform GUI Agent

Key Capabilities

Performance Highlights

Good For

Full Model Card (README)