georgehenney/Qwen3-VL-4B-Instruct-heretic-7refusal

VISIONConcurrency Cost:1Model Size:4BQuant:BF16Ctx Length:32kPublished:Nov 18, 2025License:apache-2.0Architecture:Transformer Open Weights Cold

The georgehenney/Qwen3-VL-4B-Instruct-heretic-7refusal is a 4 billion parameter vision-language model, derived from Qwen's Qwen3-VL-4B-Instruct, with its refusal behavior significantly reduced using the Heretic v1.0.1 tool. This model offers enhanced visual perception, reasoning, and extended context length, making it suitable for multimodal tasks requiring less content moderation. It excels in visual agent operations, advanced spatial perception, and long context video understanding.

Loading preview...

Model Overview

This model, georgehenney/Qwen3-VL-4B-Instruct-heretic-7refusal, is a modified version of the Qwen3-VL-4B-Instruct, a 4 billion parameter vision-language model developed by Qwen. The primary modification, performed using the Heretic v1.0.1 tool, significantly reduces the model's refusal rate from 92/100 to 7/100, making it a "decensored" variant.

Key Capabilities

  • Vision-Language Integration: Offers comprehensive upgrades in text understanding, generation, visual perception, and reasoning.
  • Reduced Refusals: Engineered to provide responses with fewer content restrictions compared to its base model.
  • Visual Agent: Capable of operating PC/mobile GUIs, recognizing elements, understanding functions, and completing tasks.
  • Advanced Spatial Perception: Judges object positions, viewpoints, and occlusions, supporting 2D and 3D grounding for spatial reasoning.
  • Long Context & Video Understanding: Features a native 256K context, expandable to 1M, enabling it to handle extensive text and hours-long video content with full recall.
  • Enhanced Multimodal Reasoning: Excels in STEM/Math tasks, providing causal analysis and logical, evidence-based answers.
  • Upgraded Visual Recognition: Trained on broader, higher-quality data to recognize a wide array of entities including celebrities, products, and landmarks.
  • Expanded OCR: Supports 32 languages and is robust in challenging conditions (low light, blur, tilt), with improved handling of rare characters and document structures.

Use Cases

This model is particularly suited for applications requiring a powerful vision-language understanding with a preference for fewer content restrictions. It can be leveraged for:

  • Automated UI interaction and task completion via its Visual Agent capabilities.
  • Complex multimodal reasoning in scientific and mathematical domains.
  • Detailed image and video analysis over long durations.
  • Multilingual OCR and document processing.
  • Creative content generation where broader response flexibility is desired.