Name: coder3101/Qwen3-VL-32B-Thinking-heretic API
Brand: Featherless.ai
Price: 25.00 USD
Availability: InStock
Author: coder3101

Model Overview

This model, coder3101/Qwen3-VL-32B-Thinking-heretic, is a 33.4 billion parameter vision-language model based on the Qwen3-VL architecture, specifically a decensored version of Qwen/Qwen3-VL-32B-Thinking. It features significant enhancements across visual and textual understanding, making it a powerful tool for complex multimodal applications. The model's architecture incorporates innovations like Interleaved-MRoPE for robust positional embeddings, DeepStack for fine-grained detail capture, and Text–Timestamp Alignment for precise video temporal modeling.

Key Capabilities

Visual Agent: Interacts with PC/mobile GUIs, recognizing elements and invoking tools to complete tasks.
Visual Coding Boost: Generates Draw.io/HTML/CSS/JS from image and video inputs.
Advanced Spatial Perception: Judges object positions, viewpoints, and occlusions, enabling 2D and 3D spatial reasoning.
Long Context & Video Understanding: Supports a native 256K context, expandable to 1M, for handling extensive documents and hours-long video with full recall.
Enhanced Multimodal Reasoning: Excels in STEM/Math tasks, providing causal analysis and logical, evidence-based answers.
Upgraded Visual Recognition: Broad, high-quality pretraining allows recognition of a wide array of entities, from celebrities to flora/fauna.
Expanded OCR: Supports 32 languages and performs robustly under challenging conditions, including low light and tilt.
Decensored Performance: Demonstrates a significantly lower refusal rate (3/100) compared to the original model (94/100), as indicated by KL divergence metrics.

Use Cases

This model is particularly well-suited for applications requiring advanced visual understanding, multimodal reasoning, and agent-like interactions. Its decensored nature may be beneficial for use cases where broader response generation is desired. The extended context length and video understanding capabilities make it ideal for processing and analyzing long-form content.

Overview

Model Overview

Key Capabilities

Use Cases

Full Model Card (README)