Kizzington/Qwen3-VL-8B-Thinking-heretic

VISIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:32kPublished:Nov 18, 2025License:apache-2.0Architecture:Transformer0.0K Open Weights Cold

Kizzington/Qwen3-VL-8B-Thinking-heretic is an 8 billion parameter vision-language model, a decensored variant of Qwen/Qwen3-VL-8B-Thinking. This model, created using the Heretic v1.0.1 tool, significantly reduces refusal rates compared to its original counterpart, achieving 0/100 refusals versus 45/100. It excels in multimodal reasoning, visual agent capabilities, and advanced spatial perception, making it suitable for applications requiring robust visual understanding and unrestricted text generation.

Loading preview...

Model Overview

Kizzington/Qwen3-VL-8B-Thinking-heretic is an 8 billion parameter vision-language model derived from Qwen/Qwen3-VL-8B-Thinking. This version has been processed with the Heretic v1.0.1 tool to create a "decensored" variant, specifically designed to reduce content refusals.

Key Differentiators

  • Decensored Output: Achieves a 0/100 refusal rate, a significant reduction from the original model's 45/100, making it suitable for use cases requiring less restrictive content generation.
  • Enhanced Multimodal Reasoning: Builds upon the Qwen3-VL series' strengths in visual perception, reasoning, and extended context length.
  • Visual Agent Capabilities: Can operate PC/mobile GUIs, recognize elements, understand functions, and complete tasks.
  • Advanced Spatial Perception: Judges object positions, viewpoints, and occlusions, providing stronger 2D and 3D grounding for spatial reasoning.
  • Long Context & Video Understanding: Supports a native 256K context, expandable to 1M, capable of handling long documents and hours of video with full recall.

Ideal Use Cases

  • Applications requiring unfiltered or less restricted text generation in response to visual or textual prompts.
  • Tasks involving complex visual understanding and reasoning, such as analyzing images, videos, and generating code from visual inputs.
  • Scenarios demanding robust multimodal interaction, including visual agents for GUI operation and advanced spatial analysis.