Tongyi-MAI/MAI-UI-8B
Tongyi-MAI/MAI-UI-8B is an 8 billion parameter foundation GUI agent developed by Tongyi-MAI, designed for real-world human-computer interaction. This model excels at GUI grounding and mobile navigation, establishing new state-of-the-art performance on benchmarks like ScreenSpot-Pro, MMBench GUI L2, OSWorld-G, UI-Vision, AndroidWorld, and MobileWorld. It features a 32768 token context length and addresses challenges in GUI agent deployment through a self-evolving data pipeline, native device-cloud collaboration, and an online RL framework.
Loading preview...
MAI-UI: Real-World Centric Foundation GUI Agents
MAI-UI is a family of foundation GUI agents, with this 8 billion parameter variant (MAI-UI-8B) designed to revolutionize human-computer interaction through advanced GUI capabilities. Developed by Tongyi-MAI, it tackles key challenges in realistic GUI agent deployment, including native agent-user interaction, UI-only operation limitations, practical deployment architecture, and brittleness in dynamic environments.
Key Capabilities & Innovations
- Unified Methodology: Employs a self-evolving data pipeline for navigation data, incorporating user interaction and MCP tool calls.
- Device-Cloud Collaboration: Features a native system that dynamically routes execution based on task state and data sensitivity, improving on-device performance by 33% and reducing cloud API calls by over 40%.
- Online Reinforcement Learning: Utilizes an advanced online RL framework optimized for scaling parallel environments and context length, demonstrating significant gains from increased environment steps and parallelization.
- High Context Length: Supports a substantial 32768 token context, crucial for complex GUI tasks.
Performance Highlights
MAI-UI-8B establishes new state-of-the-art results across various GUI benchmarks:
- Grounding: Achieves 73.5% on ScreenSpot-Pro, 91.3% on MMBench GUI L2, 70.9% on OSWorld-G, and 49.2% on UI-Vision, outperforming models like Gemini-3-Pro and Seed1.8 on ScreenSpot-Pro.
- Mobile Navigation: Sets a new SOTA of 76.7% on AndroidWorld, surpassing UI-Tars-2, Gemini-2.5-Pro, and Seed1.8. It also obtains a 41.7% success rate on MobileWorld, competitive with Gemini-3-Pro based agentic frameworks.
Use Cases
MAI-UI-8B is ideal for applications requiring robust and intelligent GUI automation, mobile interaction, and agentic control in complex digital environments. Its focus on real-world deployment makes it suitable for developing next-generation human-computer interfaces and automated task execution across various platforms.