meituan/EvoCUA-32B-20260105
EvoCUA-32B-20260105 by Meituan is a 33.4 billion parameter multimodal model designed as an Evolving Computer Use Agent. It excels at end-to-end multi-turn automation, operating applications like Chrome, Excel, and VSCode through screenshots and natural language instructions. This model achieves a 56.7% task completion rate on the OSWorld benchmark, ranking #1 among open-source models, and demonstrates strong zero-shot cross-OS generalization with a 56.48% score on WindowsAgentArena. EvoCUA-32B also exhibits high safety, with the lowest unintended-behavior rate (35.0%) among tested Computer Use Agents.
Loading preview...
EvoCUA-32B-20260105: A Leading Computer Use Agent
EvoCUA-32B-20260105, developed by Meituan, is a 33.4 billion parameter multimodal model specifically engineered as an Evolving Computer Use Agent (CUA). It stands out as the #1 open-source model on the OSWorld benchmark, achieving a 56.7% task completion rate, significantly outperforming models like OpenCUA-72B and Qwen3-VL thinking with fewer parameters and steps. A smaller 8B version is also available, demonstrating competitive performance with 72B-level models.
Key Capabilities & Differentiators
- End-to-End Multi-Turn Automation: Capable of operating various applications such as Chrome, Excel, PowerPoint, and VSCode using screenshots and natural language instructions.
- Superior Cross-OS Generalization: Achieves 56.48% on WindowsAgentArena, showcasing robust zero-shot generalization from its Linux-based training environment to Windows.
- Enhanced Safety: An independent study found EvoCUA-32B to have the lowest unintended-behavior rate (35.0%) among tested CUAs, indicating strong robustness against tricky instructions.
- Novel Training Method: Utilizes a unique data synthesis and training approach that consistently improves computer use capabilities across multiple open-source VLMs without degrading general performance.
Ideal Use Cases
- Automating Complex Desktop Workflows: For tasks requiring interaction with multiple applications and operating systems.
- Developing Robust AI Agents: As a foundation for building agents that need to perform reliably and safely in diverse computing environments.
- Research in Computer Use Agents: Provides a strong open-source baseline for further development and evaluation in the CUA domain.