Name: coder3101/Qwen3-VL-32B-Instruct-heretic-v2 API
Brand: Featherless.ai
Price: 25.00 USD
Availability: InStock
Author: coder3101

Overview

This model, coder3101/Qwen3-VL-32B-Instruct-heretic-v2, is a 33.4 billion parameter vision-language model based on the Qwen3-VL-32B-Instruct architecture. It has been processed using the Heretic v1.1.0 tool to create a decensored version. The original Qwen3-VL series is noted for its comprehensive upgrades in text understanding, visual perception, reasoning, and extended context length, including enhanced spatial and video dynamics comprehension.

Key Differentiators

This specific model variant distinguishes itself by significantly reducing refusals compared to the original Qwen3-VL-32B-Instruct, with 9 refusals out of 100 compared to 99/100 for the base model. This decensoring was achieved through specific abliteration parameters applied to the attention and MLP layers.

Core Capabilities (from original Qwen3-VL-32B-Instruct)

Visual Agent: Capable of operating PC/mobile GUIs by recognizing elements, understanding functions, and completing tasks.
Visual Coding Boost: Generates Draw.io/HTML/CSS/JS from images/videos.
Advanced Spatial Perception: Judges object positions, viewpoints, and occlusions, supporting 2D and 3D grounding for spatial reasoning.
Long Context & Video Understanding: Features a native 256K context, expandable to 1M, handling extensive text and hours-long video with full recall.
Enhanced Multimodal Reasoning: Excels in STEM/Math tasks, providing causal analysis and logical, evidence-based answers.
Upgraded Visual Recognition: Broad and high-quality pretraining enables recognition of a wide array of entities including celebrities, products, and landmarks.
Expanded OCR: Supports 32 languages and is robust in challenging conditions like low light or blur, with improved long-document structure parsing.
Text Understanding: Offers seamless text-vision fusion for unified comprehension on par with pure LLMs.

Model Architecture Updates

Key architectural enhancements in the Qwen3-VL series include Interleaved-MRoPE for robust positional embeddings across time, width, and height, DeepStack for fusing multi-level ViT features, and Text-Timestamp Alignment for precise event localization in video.

Overview

Overview

Key Differentiators

Core Capabilities (from original Qwen3-VL-32B-Instruct)

Model Architecture Updates

Full Model Card (README)