coder3101/Qwen3-VL-8B-Instruct-heretic

VISIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:32kPublished:Jan 17, 2026License:apache-2.0Architecture:Transformer0.0K Open Weights Cold

The coder3101/Qwen3-VL-8B-Instruct-heretic is an 8 billion parameter vision-language model, derived from Qwen/Qwen3-VL-8B-Instruct, with a 32768 token context length. This version has been decensored using the Heretic v1.1.0 tool, significantly reducing refusal rates compared to the original model. It retains the Qwen3-VL's advanced multimodal capabilities, including visual agent operation, enhanced spatial perception, and long context video understanding, making it suitable for applications requiring less restrictive content generation.

Loading preview...

coder3101/Qwen3-VL-8B-Instruct-heretic Overview

This model is a decensored version of the powerful Qwen/Qwen3-VL-8B-Instruct, created using the Heretic v1.1.0 tool. It maintains the original model's 8 billion parameters and 32768 token context length, while significantly reducing content refusals. The original Qwen3-VL series is known for its comprehensive upgrades in text understanding, visual perception, and reasoning.

Key Differentiators & Capabilities

  • Decensored Output: Achieves a refusal rate of 6/100 compared to the original model's 100/100, offering less restricted content generation.
  • Advanced Multimodal Reasoning: Excels in STEM/Math tasks, causal analysis, and logical, evidence-based answers.
  • Visual Agent: Capable of operating PC/mobile GUIs, recognizing elements, understanding functions, and completing tasks.
  • Enhanced Spatial Perception: Judges object positions, viewpoints, and occlusions, providing stronger 2D and 3D grounding for spatial reasoning.
  • Long Context & Video Understanding: Features a native 256K context, expandable to 1M, handling books and hours-long video with full recall and second-level indexing.
  • Upgraded Visual Recognition: Broad, high-quality pretraining allows it to recognize a wide array of entities including celebrities, anime, products, and flora/fauna.
  • Expanded OCR: Supports 32 languages and is robust in challenging conditions (low light, blur, tilt), with improved parsing for rare characters and long documents.

When to Use This Model

This model is particularly suited for use cases where the original Qwen3-VL-8B-Instruct's content restrictions are undesirable, and a more open-ended response generation is required. It is ideal for applications demanding advanced vision-language understanding, multimodal reasoning, and visual agent capabilities, especially when dealing with diverse and potentially sensitive content without strict filtering.