Name: Mininglamp-2718/Mano-CUA-4B-Thinking-1.1 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: Mininglamp-2718

Overview

Mano-CUA-4B-Thinking-1.1 is a 4 billion parameter GUI-VLA (Visual Language Agent) model developed by Mininglamp-2718, specifically engineered for efficient operation on edge devices like Apple Silicon Macs. It is a full-precision (fp16) version within the larger Mano open-source model series, with an MLX 8-bit quantized version also available for optimized local inference.

Key Capabilities

Complex GUI Automation: Executes intricate interface operations involving numerous interactive elements.
Cross-System Data Integration: Extracts and combines data from various sources purely through visual interaction, bypassing API dependencies.
Long-Task Planning: Supports enterprise-level business process automation, handling workflows with dozens to hundreds of steps.
Intelligent Report Generation: Automatically creates structured documents such such as data analysis reports and work summaries.

Technical Approach

The model utilizes the Mano-Action bidirectional self-reinforcement learning method and a three-stage progressive training approach (SFT → Offline RL → Online RL). It incorporates a "think-act-verify" loop reasoning mechanism for high-precision GUI understanding and operation. Edge device optimization is achieved through mixed-precision quantization, visual token pruning, and adapted inference techniques.

Action Space

Mano-CUA provides a comprehensive action space for GUI interaction, including open_app, open_url, click, type, hotkey, scroll, drag, wait, finish, stop, and call_user for requesting human assistance. The model outputs structured XML with actions and coordinates normalized to a [0, 1000] range.

Overview

Overview

Key Capabilities

Technical Approach

Action Space

Full Model Card (README)