coder3101/Qwen3-VL-32B-Thinking-heretic-v2

VISIONConcurrency Cost:2Model Size:33.4BQuant:FP8Ctx Length:32kPublished:Dec 16, 2025License:apache-2.0Architecture:Transformer0.0K Open Weights Cold

coder3101/Qwen3-VL-32B-Thinking-heretic-v2 is a 33.4 billion parameter vision-language model, a decensored version of Qwen/Qwen3-VL-32B-Thinking. This model offers superior text understanding and generation, enhanced visual perception, and extended context length, making it suitable for complex multimodal reasoning tasks. It features advanced spatial perception, long context video understanding, and upgraded visual recognition, with significantly reduced refusal rates compared to its original counterpart.

Loading preview...

Model Overview

This model, coder3101/Qwen3-VL-32B-Thinking-heretic-v2, is a 33.4 billion parameter vision-language model derived from Qwen/Qwen3-VL-32B-Thinking, specifically modified using Heretic v1.1.0 to be a decensored version. It maintains the comprehensive upgrades of the Qwen3-VL series, including superior text understanding and generation, deeper visual perception, and extended context length of 32768 tokens.

Key Capabilities

  • Decensored Output: Achieves a refusal rate of 3/100, significantly lower than the original model's 97/100, indicating less content filtering.
  • Multimodal Reasoning: Excels in STEM/Math tasks, providing causal analysis and logical, evidence-based answers.
  • Advanced Visual Perception: Features enhanced spatial perception for judging object positions and viewpoints, and upgraded visual recognition for a wide range of entities.
  • Long Context & Video Understanding: Capable of handling long contexts and video inputs with full recall and second-level indexing.
  • Expanded OCR: Supports 32 languages and is robust in challenging conditions like low light or blur, with improved long-document structure parsing.

Good For

  • Applications requiring a less restrictive content policy in a vision-language model.
  • Complex multimodal tasks involving detailed image and video analysis, spatial reasoning, and text generation.
  • Use cases demanding high performance in STEM/Math reasoning and comprehensive visual recognition.