mPLUG/GUI-Owl-1.5-8B-Think
GUI-Owl-1.5-8B-Think is an 8 billion parameter model from the GUI-Owl 1.5 family, built on Qwen3-VL, designed for multi-platform GUI automation with a 32768 token context length. Developed by mPLUG, this 'Thinking' variant excels at complex tasks requiring planning and reflection, offering state-of-the-art performance in GUI agent benchmarks. It features native support for external tool invocation, MCP server coordination, and built-in long-horizon memory for robust agentic capabilities.
Loading preview...
GUI-Owl-1.5-8B-Think: A Multi-Platform GUI Agent
mPLUG's GUI-Owl-1.5-8B-Think is an 8 billion parameter model from the GUI-Owl 1.5 family, built upon Qwen3-VL, specifically designed for advanced multi-platform GUI automation. This "Thinking" variant is optimized for complex tasks that require planning and reflection, distinguishing it from its "Instruct" counterparts which are geared for faster inference. It supports GUI automation across desktops, mobile devices, and browsers, leveraging a scalable hybrid data flywheel and multi-platform environment RL (MRPO).
Key Capabilities & Features
- State-of-the-art Performance: Achieves leading results across various multi-platform GUI benchmarks, including OSWorld-Verified, AndroidWorld, Mobile-World, WindowsAA, WebArena, VisualWebArena, and WebVoyager.
- Tool & MCP Calling: Provides native support for invoking external tools and coordinating with MCP servers, demonstrating strong performance on OSWorld-MCP and Mobile-World.
- Long-Horizon Memory: Incorporates built-in memory capabilities, eliminating the need for external workflow orchestration and leading native agent models on MemGUI-Bench.
- Multi-Agent Ready: Can function as a standalone end-to-end agent or as specialized roles (planner, executor, verifier, notetaker) within the Mobile-Agent-v3.5 framework.
- High Context Length: Supports a context length of 32768 tokens, enabling processing of extensive interactions.
When to Use This Model
GUI-Owl-1.5-8B-Think is ideal for developers and researchers focused on building sophisticated GUI automation solutions, especially those requiring:
- Complex Task Execution: Its "Thinking" variant is suited for scenarios demanding advanced planning and reflection.
- Multi-Platform Automation: For applications spanning desktop, mobile, and web environments.
- Agentic Workflows: When integrating with multi-agent systems or requiring robust, memory-aware agents for long-horizon tasks.
- High Performance: For use cases where benchmark-leading performance in GUI automation is critical.