Name: vocaela/KV-Ground-4B-BaseGuiOwl1.5-0228 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: vocaela

Overview

The KV-Ground-4B-BaseGuiOwl1.5-0228 is a 4 billion parameter Vision-Language Model (VLM) developed by Kingsware and Vocaela AI. It is specifically optimized for GUI grounding tasks, particularly with high-resolution images. This model is a fine-tuned version of the GUI-Owl-1.5-4B-Instruct, inheriting its architecture and configurations.

Key Capabilities & Differentiators

High-Resolution GUI Grounding: Achieves 67.0 on ScreenSpot-Pro, positioning it as the best-performing 4B model for high-resolution GUI grounding without requiring Chain-of-Thought (CoT) reasoning.
Robust Performance on Regular Resolution: Maintains excellent performance on standard resolution tasks, scoring 94.1 on ScreenSpot-V2, indicating its versatility.
Optimized Training Methodology: Developed using a unique approach involving:
- Data Cleaning: Utilizes MLLM as a judge for multiple rounds of data cleaning to address ~30% errors in public GUI grounding datasets, which significantly improves performance on high-resolution images.
- Synthesized High-Resolution Data: Incorporates high-quality, synthesized high-resolution GUI grounding data.
- Continuous Post-Training: Employs Supervised Fine-Tuning (SFT) followed by Reinforcement Learning from Human Feedback (GRPO) for continuous improvement.
Benchmark Performance: Demonstrates consistent improvements across various benchmarks compared to its base model, GUI-Owl-1.5-4B-Instruct, and competes effectively with other specialized GUI models under 8B parameters.

Use Cases

This model is ideal for applications requiring precise GUI element identification and interaction based on visual input and natural language commands, especially in scenarios involving detailed or high-resolution user interfaces. Its strengths make it suitable for automated UI testing, accessibility tools, and intelligent agents interacting with graphical user interfaces.

Overview

Overview

Key Capabilities & Differentiators

Use Cases

Full Model Card (README)