coder3101/Qwen3-VL-2B-Instruct-heretic

VISIONConcurrency Cost:1Model Size:2BQuant:BF16Ctx Length:32kPublished:Nov 23, 2025License:apache-2.0Architecture:Transformer0.0K Open Weights Cold

coder3101/Qwen3-VL-2B-Instruct-heretic is a 2 billion parameter vision-language model, derived from Qwen's Qwen3-VL-2B-Instruct, with a 32768 token context length. This version has been modified using Heretic v1.0.1 to significantly reduce refusals, achieving 2 refusals per 100 prompts compared to the original model's 95. It excels in multimodal reasoning, visual agent capabilities, and advanced spatial perception, making it suitable for applications requiring less restrictive content generation.

Loading preview...

Model Overview

This model, coder3101/Qwen3-VL-2B-Instruct-heretic, is a 2 billion parameter vision-language model based on the Qwen3-VL-2B-Instruct architecture. It has been specifically modified using the Heretic v1.0.1 tool to reduce content refusals, making it a "decensored" version of the original. While the original model exhibited 95 refusals per 100 prompts, this modified version demonstrates a significantly lower refusal rate of 2 per 100 prompts, as indicated by KL divergence of 0.45.

Key Capabilities

  • Reduced Refusals: Engineered to provide less restrictive content generation compared to its base model.
  • Vision-Language Integration: Inherits the comprehensive multimodal capabilities of the Qwen3-VL series, including superior text understanding and generation, and deep visual perception.
  • Advanced Visual Features: Supports visual agent functionalities, visual coding boost (generating Draw.io/HTML/CSS/JS from images/videos), and enhanced spatial perception for 2D/3D grounding.
  • Long Context Understanding: Capable of handling long contexts and video understanding with native 256K context, expandable to 1M.
  • Multimodal Reasoning: Excels in STEM/Math tasks, offering causal analysis and logical, evidence-based answers.
  • Expanded OCR: Features upgraded OCR supporting 32 languages, robust in challenging conditions, and improved long-document structure parsing.

When to Use This Model

This model is particularly suited for use cases where the original Qwen3-VL-2B-Instruct model's refusal rate is a limiting factor. Developers needing a powerful vision-language model with fewer content restrictions for tasks like creative content generation, nuanced visual analysis, or interactive agent development will find this model beneficial. Its strong multimodal reasoning and visual capabilities make it a versatile choice for applications requiring advanced image and video understanding combined with flexible text generation.