Name: duvoai/duvo-eye-1 API
Brand: Featherless.ai
Price: 25.00 USD
Availability: InStock
Author: duvoai

duvo-eye-1: GUI Grounding for Enterprise Computer Use

duvo-eye-1 is a Vision-Language Model (VLM) developed by Duvo, specifically designed for single-step GUI element grounding. Given a screenshot and a natural-language description, it outputs a precise click position {"x","y"}. This model is a LoRA fine-tune of Hcompany/Holo-3.1-35B-A3B, a 35B-A3B MoE with 3B active parameters, trained on synthetic enterprise back-office UIs.

Key Capabilities & Performance

Top-tier Grounding: Ranks #1 on the maintained UI-I2E-Bench leaderboard (84.2) and achieves 95.1 on ScreenSpot-v2, matching top models. It also exceeds the best published UI-Vision element-grounding number (64.4).
Efficiency: Achieves these results with only 3B active parameters, offering high performance at a lower serving cost compared to larger models.
Output Reliability: Significantly improves over its base model by eliminating malformed outputs, ensuring consistent and valid JSON responses.
In-domain Expertise: Shows substantial gains in its target enterprise-UI domain (SynthUI test: 86.6 vs. 62.5 for the base).
Single-Shot Excellence: Its 72.9 on ScreenSpot-Pro is the second-highest single-forward-pass result on the public leaderboard, outperforming many larger single models.

Good for

Automating GUI Interactions: Ideal as the grounding component within a larger agent stack for computer use, resolving "what" to interact with to "where".
Enterprise Applications: Particularly strong for web, desktop, and professional-software UIs, especially those resembling enterprise back-office systems.
Multilingual UI Support: While instructions are in English, it supports English, French, and German interfaces, inheriting multilingual capabilities from its base model.
Reproducible Benchmarking: All public-benchmark predictions are published, and three benchmarks are confirmed under maintainers' own scorers, ensuring transparency and verifiability.

Overview

duvo-eye-1: GUI Grounding for Enterprise Computer Use

Key Capabilities & Performance

Good for

Full Model Card (README)