janhq/Jan-v2-VL-high

VISIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:32kPublished:Nov 6, 2025License:apache-2.0Architecture:Transformer0.1K Open Weights Cold

The Jan-v2-VL-high model by janhq is an 8-billion-parameter vision-language model designed for long-horizon, multi-step tasks in real software environments. This variant prioritizes deeper reasoning and higher 'think time' for complex operations. It combines language reasoning with visual perception to follow intricate instructions and maintain intermediate states, excelling in agentic automation and UI control with screenshot grounding and tool calls.

Loading preview...

Overview

Jan-v2-VL-high is an 8-billion-parameter vision-language model developed by janhq, specifically engineered for long-horizon, multi-step tasks within real software environments like browsers and desktop applications. This model integrates language reasoning with visual perception, enabling it to execute complex instructions, manage intermediate states, and recover from minor errors. It is the 'high' variant, optimized for deeper reasoning and higher 'think time' compared to its 'low' and 'med' counterparts.

Key Capabilities

  • Multimodal Agent: Combines vision and language for understanding and interacting with software interfaces.
  • Long-Horizon Execution: Built for stable, many-step task execution, minimizing drift over extended operations.
  • Error Recovery: Designed to recover from minor execution errors, enhancing task completion reliability.
  • Agentic Automation: Capable of stepwise operation in browsers and desktop apps, utilizing screenshot grounding and tool calls (e.g., BrowserMCP).
  • Performance: Shows no degradation and slight improvements on standard text-only and vision tasks compared to its base, Qwen-3-VL-8B-Thinking, while delivering stronger long-horizon execution on the Illusion of Diminishing Returns benchmark.

Intended Use Cases

  • Agentic Automation & UI Control: Ideal for tasks requiring stable, multi-step interaction with user interfaces, where plans and knowledge can be provided upfront.
  • Complex Software Operations: Suitable for scenarios demanding robust, long-chain execution in desktop or web applications.