Name: inclusionAI/VISTA-9B API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: inclusionAI

VISTA-9B: GUI-Grounding Vision-Language Model

VISTA-9B is a 9 billion parameter vision-language model developed by inclusionAI, based on the Qwen3.5 architecture, specifically engineered for GUI grounding. This model excels at interpreting a screenshot and a natural-language instruction to predict a precise click coordinate (in a normalized 0-1000 image frame) on a graphical user interface.

Key Capabilities & Training Innovations

GUI Grounding: Maps visual input and text instructions to exact click locations.
View-Consistent GRPO Training: Employs a novel training method that builds comparison groups from target-preserving views of the same GUI instance, with exact coordinate remapping across cropped views. This improves localization robustness under varying visual presentations.
Self-Verified Cross-View Anchoring: Incorporates a training objective that adds oracle-format center-point anchors only when model-generated rollouts achieve maximum reward, stabilizing short coordinate generation.

Performance Highlights

VISTA-9B demonstrates strong performance on GUI grounding benchmarks, consistently outperforming its Qwen3.5 and GRPO counterparts. For instance, it achieves 69.2% on SSPro, 95.8% on SSV2, 68.1% on OSWorld-G, and 75.5% on OSWorld-G-R, showing improvements across various metrics compared to previous 9B models.

Recommended Use Cases

Automated UI Interaction: Ideal for tasks requiring precise interaction with graphical user interfaces based on natural language commands.
UI Testing and Automation: Can be used to automate testing workflows by programmatically clicking specific elements on a screen.
Robotic Process Automation (RPA): Applicable in scenarios where a robot needs to understand and interact with software interfaces.

Overview

VISTA-9B: GUI-Grounding Vision-Language Model

Key Capabilities & Training Innovations

Performance Highlights

Recommended Use Cases

Full Model Card (README)