janhq/Jan-v2-VL-low
Jan-v2-VL-low is an 8-billion parameter vision-language model developed by janhq, designed for long-horizon, multi-step tasks in real software environments like browsers and desktop applications. This variant is efficiency-oriented, offering lower latency while combining language reasoning with visual perception to follow complex instructions and maintain intermediate state. It excels at agentic automation and UI control, demonstrating strong long-horizon execution on the Illusion of Diminishing Returns benchmark without degrading performance on standard text-only and vision tasks compared to its base model.
Loading preview...
Jan-v2-VL: Multimodal Agent for Long-Horizon Tasks
Jan-v2-VL is an 8-billion parameter vision–language model developed by janhq, specifically engineered for executing long-horizon, multi-step tasks within real software environments such as browsers and desktop applications. It integrates language reasoning with visual perception, enabling it to follow complex instructions, manage intermediate states, and recover from minor execution errors.
Key Capabilities
- Long-Horizon Execution: Built for stable, many-step execution, crucial for real-world tasks where small per-step gains compound into longer successful chains. Evaluated using the "Illusion of Diminishing Returns" benchmark, which measures execution length and aligns with robust long-horizon ability.
- Multimodal Perception: Combines language understanding with visual input to interact effectively with graphical user interfaces.
- Performance: Shows no degradation on standard text-only and vision tasks compared to its base model (Qwen-3-VL-8B-Thinking), and even performs slightly better on several, while delivering stronger long-horizon execution.
Good For
- Agentic Automation & UI Control: Ideal for stepwise operations in browsers and desktop applications, leveraging screenshot grounding and tool calls (e.g., BrowserMCP).
- Complex Task Automation: Suitable for tasks where a plan or knowledge can be provided upfront, and success depends on stable, multi-step execution with minimal drift.
This specific variant, Jan-v2-VL-low, is optimized for efficiency and lower latency, making it a practical choice for applications requiring responsive agentic behavior.