Name: janhq/Jan-v2-VL-med API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: janhq

Jan-v2-VL: Multimodal Agent for Long-Horizon Tasks

Jan-v2-VL is an 8-billion parameter vision-language model developed by janhq, specifically engineered for complex, multi-step tasks within real software environments such as browsers and desktop applications. It integrates language understanding with visual perception to enable robust, long-horizon execution, which is critical for real-world automation.

Key Capabilities

Vision-Language Integration: Combines linguistic reasoning with visual input to understand and interact with software interfaces.
Long-Horizon Execution: Designed for stable, many-step task completion, minimizing drift and recovering from minor errors.
Agentic Automation: Excels at stepwise operation in browsers and desktop apps, utilizing screenshot grounding and tool calls.
Balanced Performance: The "med" variant offers a balance between latency and quality, suitable for a wide range of applications.

Good For

Automating complex workflows in web browsers or desktop applications.
Developing AI agents that require visual perception and multi-step reasoning.
Tasks where stable, many-step execution with minimal drift is paramount.
UI control applications that benefit from screenshot grounding and tool integration.

Jan-v2-VL demonstrates no degradation on standard text-only and vision tasks compared to its base model (Qwen-3-VL-8B-Thinking), while delivering stronger long-horizon execution on the Illusion of Diminishing Returns benchmark.

Overview

Jan-v2-VL: Multimodal Agent for Long-Horizon Tasks

Key Capabilities

Good For

Full Model Card (README)