Name: vocaela/KV-Ground-8B-BaseGuiOwl1.5-0315 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: vocaela

Model Overview

KV-Ground-8B-BaseGuiOwl1.5-0315 is an 8 billion parameter Vision-Language Model (VLM) developed by Kingsware and Vocaela AI. It is fine-tuned from the GUI-Owl-1.5-8B-Instruct model, with a primary focus on optimizing performance for high-resolution Graphical User Interface (GUI) grounding tasks. The model takes an image and natural language instruction as input and generates text output.

Key Capabilities & Differentiators

High-Resolution GUI Grounding: Specifically trained and optimized for high-resolution GUI images, addressing common performance degradation issues in this domain.
Superior Benchmarking: Achieves 73.2 on ScreenSpot-Pro without reasoning CoT, ranking as the best pure model capability across all models in this benchmark. When combined with a zoom-in strategy, it reaches 80.5, making it the top-ranked system.
Consistent Performance: Maintains excellent performance on regular-resolution tasks, scoring 94.6 on ScreenSpot-V2, and shows notable gains on OSWorld-G and OSWorld-G-refined.
Advanced Training Methodology: Utilizes a unique recipe involving MLLM-as-judge for data cleaning, synthesis of high-quality high-resolution GUI grounding data, and continued post-training via SFT followed by GRPO.

Ideal Use Cases

This model is particularly well-suited for applications requiring precise interaction and understanding of high-resolution graphical interfaces, such as:

Automated UI testing and interaction
Robotic Process Automation (RPA) for GUI-driven tasks
Accessibility tools for navigating complex interfaces
Developing intelligent agents that interact with digital environments

Overview

Model Overview

Key Capabilities & Differentiators

Ideal Use Cases

Full Model Card (README)