Tongyi-MAI/MAI-UI-2B
MAI-UI-2B is a 2 billion parameter foundation GUI agent developed by Tongyi-MAI, designed for real-world human-computer interaction. This model excels at GUI grounding and mobile navigation tasks, establishing new state-of-the-art performance on benchmarks like ScreenSpot-Pro and AndroidWorld. It features a self-evolving data pipeline, native device-cloud collaboration, and an online RL framework to address challenges in GUI agent deployment.
Loading preview...
MAI-UI-2B: A Foundation GUI Agent
MAI-UI-2B is part of the MAI-UI family of foundation GUI agents, specifically designed to revolutionize human-computer interaction through advanced GUI control. Developed by Tongyi-MAI, this 2 billion parameter model addresses critical challenges in realistic GUI agent deployment, including native agent–user interaction, UI-only operation limitations, practical deployment architecture, and brittleness in dynamic environments.
Key Capabilities & Innovations
- State-of-the-Art GUI Grounding: Achieves 73.5% on ScreenSpot-Pro, 91.3% on MMBench GUI L2, 70.9% on OSWorld-G, and 49.2% on UI-Vision, surpassing other models like Gemini-3-Pro and Seed1.8 on ScreenSpot-Pro.
- Superior Mobile Navigation: Sets a new state-of-the-art with 76.7% on AndroidWorld and 41.7% on MobileWorld, outperforming UI-Tars-2, Gemini-2.5-Pro, and competitive with Gemini-3-Pro based agentic frameworks.
- Self-Evolving Data Pipeline: Expands navigation data to include user interaction and MCP tool calls, enhancing agent adaptability.
- Native Device–Cloud Collaboration: Dynamically routes execution based on task state and data sensitivity, improving on-device performance by 33% and reducing cloud API calls by over 40%.
- Online Reinforcement Learning Framework: Features advanced optimizations for scaling parallel environments and context length, showing significant gains from increased environment steps and parallelization.
Ideal Use Cases
MAI-UI-2B is particularly well-suited for applications requiring robust and intelligent interaction with graphical user interfaces, especially in mobile environments. Its strengths make it an excellent choice for:
- Automated mobile application testing and interaction.
- Developing advanced GUI-driven assistants and agents.
- Tasks involving complex UI navigation and element grounding.