Name: coder3101/Qwen3-VL-32B-Instruct-Heretic API
Brand: Featherless.ai
Price: 25.00 USD
Availability: InStock
Author: coder3101

Overview

This model, coder3101/Qwen3-VL-32B-Instruct-Heretic, is a 33.4 billion parameter vision-language model derived from the Qwen3-VL series. It has been decensored using the Heretic v1.0.1 tool, resulting in a substantial reduction in refusal rates (4/100) compared to the original Qwen/Qwen3-VL-32B-Instruct model (89/100).

Key Capabilities

Enhanced Visual Perception & Reasoning: Offers deeper understanding of visual inputs, including spatial relationships and video dynamics.
Extended Context Length: Supports a native 256K context, expandable to 1M, enabling processing of long documents and hours-long video with full recall.
Visual Agent Operations: Capable of interacting with PC/mobile GUIs, recognizing elements, understanding functions, and completing tasks.
Visual Coding Boost: Can generate Draw.io, HTML, CSS, and JS from images or videos.
Advanced Spatial Perception: Judges object positions, viewpoints, and occlusions, supporting 2D and 3D grounding for embodied AI.
Upgraded Visual Recognition & OCR: Recognizes a broad range of entities and supports OCR in 32 languages, robustly handling challenging conditions and complex document structures.
Seamless Text-Vision Fusion: Achieves text understanding on par with pure LLMs through unified comprehension.

Architectural Innovations

Interleaved-MRoPE: Utilizes full-frequency allocation over time, width, and height via robust positional embeddings for enhanced long-horizon video reasoning.
DeepStack: Fuses multi-level ViT features to capture fine-grained details and improve image-text alignment.
Text-Timestamp Alignment: Provides precise, timestamp-grounded event localization for stronger video temporal modeling.

Overview

Overview

Key Capabilities

Architectural Innovations

Full Model Card (README)