Name: OpenGVLab/ScaleCUA-32B API
Brand: Featherless.ai
Price: 25.00 USD
Availability: InStock
Author: OpenGVLab

ScaleCUA-32B: Cross-Platform Computer Use Agent

ScaleCUA-32B is a 32 billion parameter Vision-Language Model developed by OpenGVLab, specifically designed to function as a versatile computer use agent. It addresses the need for open-source models capable of automating interactions across diverse graphical user interfaces.

Key Capabilities

Cross-Platform Operation: Trained on a novel, large-scale dataset spanning 6 operating systems and 3 task domains, enabling seamless interaction across heterogeneous platforms.
GUI Understanding & Grounding: Demonstrates strong performance in interpreting visual interfaces and grounding actions.
Task Automation: Capable of completing complex, multi-step tasks through two primary modes:
- Direct Action Mode: For immediate, executable actions based on visual input, such as clicking specific UI elements.
- Reasoned Action Mode: For complex tasks, where the model first reasons through the problem, states its intended operation, and then generates corresponding action code.
State-of-the-Art Performance: Achieves high success rates on various benchmarks, including +26.6 points on WebArena-Lite-v2 and +10.7 points on ScreenSpot-Pro compared to baselines. It also boasts 94.4% on MMBench-GUI L1-Hard, 60.6% on OSWorld-G, and 47.4% on WebArena-Lite-v2.

Good For

Developing autonomous agents for desktop, mobile, and web environments.
Automating repetitive or complex GUI-based tasks.
Research in computer vision, natural language processing, and agentic AI, particularly for cross-platform interaction.

Overview

ScaleCUA-32B: Cross-Platform Computer Use Agent

Key Capabilities

Good For

Full Model Card (README)