Tongyi-MAI/MAI-UI-2B

VISIONConcurrency Cost:1Model Size:2BQuant:BF16Ctx Length:32kPublished:Dec 25, 2025License:apache-2.0Architecture:Transformer0.1K Open Weights Cold

MAI-UI-2B is a 2 billion parameter foundation GUI agent developed by Tongyi-MAI, designed for real-world human-computer interaction. This model excels at GUI grounding and mobile navigation tasks, establishing new state-of-the-art performance on benchmarks like ScreenSpot-Pro and AndroidWorld. It features a self-evolving data pipeline, native device-cloud collaboration, and an online RL framework to address challenges in GUI agent deployment.

Loading preview...

MAI-UI-2B: A Foundation GUI Agent

MAI-UI-2B is part of the MAI-UI family of foundation GUI agents, specifically designed to revolutionize human-computer interaction through advanced GUI control. Developed by Tongyi-MAI, this 2 billion parameter model addresses critical challenges in realistic GUI agent deployment, including native agent–user interaction, UI-only operation limitations, practical deployment architecture, and brittleness in dynamic environments.

Key Capabilities & Innovations

  • State-of-the-Art GUI Grounding: Achieves 73.5% on ScreenSpot-Pro, 91.3% on MMBench GUI L2, 70.9% on OSWorld-G, and 49.2% on UI-Vision, surpassing other models like Gemini-3-Pro and Seed1.8 on ScreenSpot-Pro.
  • Superior Mobile Navigation: Sets a new state-of-the-art with 76.7% on AndroidWorld and 41.7% on MobileWorld, outperforming UI-Tars-2, Gemini-2.5-Pro, and competitive with Gemini-3-Pro based agentic frameworks.
  • Self-Evolving Data Pipeline: Expands navigation data to include user interaction and MCP tool calls, enhancing agent adaptability.
  • Native Device–Cloud Collaboration: Dynamically routes execution based on task state and data sensitivity, improving on-device performance by 33% and reducing cloud API calls by over 40%.
  • Online Reinforcement Learning Framework: Features advanced optimizations for scaling parallel environments and context length, showing significant gains from increased environment steps and parallelization.

Ideal Use Cases

MAI-UI-2B is particularly well-suited for applications requiring robust and intelligent interaction with graphical user interfaces, especially in mobile environments. Its strengths make it an excellent choice for:

  • Automated mobile application testing and interaction.
  • Developing advanced GUI-driven assistants and agents.
  • Tasks involving complex UI navigation and element grounding.