coder3101/Qwen3-VL-2B-Thinking-heretic

VISIONConcurrency Cost:1Model Size:2BQuant:BF16Ctx Length:32kPublished:Nov 23, 2025License:apache-2.0Architecture:Transformer Open Weights Cold

coder3101/Qwen3-VL-2B-Thinking-heretic is a 2 billion parameter vision-language model, derived from Qwen/Qwen3-VL-2B-Thinking, with a 32768 token context length. This version has been decensored using Heretic v1.0.1, resulting in a significantly reduced refusal rate compared to the original. It offers enhanced visual perception, reasoning, and agent interaction capabilities, making it suitable for multimodal tasks requiring less content moderation.

Loading preview...

Model Overview

This model, coder3101/Qwen3-VL-2B-Thinking-heretic, is a 2 billion parameter vision-language model based on the Qwen3-VL architecture, featuring a 32768 token context length. It is a decensored variant of the original Qwen/Qwen3-VL-2B-Thinking model, processed using Heretic v1.0.1. A key differentiator is its significantly lower refusal rate (5/100) compared to the original model (89/100), achieved with a KL divergence of 0.01.

Key Capabilities

  • Enhanced Multimodal Reasoning: Excels in STEM/Math, causal analysis, and logical, evidence-based answers.
  • Advanced Visual Perception: Features upgraded visual recognition, spatial perception, and expanded OCR supporting 32 languages.
  • Visual Agent: Capable of operating PC/mobile GUIs, recognizing elements, understanding functions, and completing tasks.
  • Visual Coding Boost: Can generate Draw.io/HTML/CSS/JS from images/videos.
  • Long Context & Video Understanding: Supports a native 256K context, expandable to 1M, for handling long documents and hours-long video with full recall.

When to Use This Model

This model is particularly well-suited for applications requiring robust vision-language understanding and generation with a preference for less content moderation. Its decensored nature makes it a candidate for use cases where the original model's refusal rates might be prohibitive. It is ideal for tasks involving visual agents, code generation from images, complex multimodal reasoning, and detailed visual recognition across various domains.