tvall43/Qwen3.5-4B-heretic
tvall43/Qwen3.5-4B-heretic is a 4.5 billion parameter causal language model based on the Qwen3.5 architecture, specifically a decensored version of unsloth/Qwen3.5-4B created using Heretic v1.2.0. This model features a 32768 token context length and is designed for multimodal learning, architectural efficiency, and scalable reinforcement learning. Its primary differentiator is its decensored nature, offering reduced refusals compared to the original model, making it suitable for applications requiring less restrictive content generation.
Loading preview...
Model Overview
tvall43/Qwen3.5-4B-heretic is a 4.5 billion parameter multimodal causal language model, derived from the Qwen3.5 architecture. It is a decensored variant of unsloth/Qwen3.5-4B, processed using Heretic v1.2.0. The model supports a native context length of 32,768 tokens, extensible up to 1,010,000 tokens using RoPE scaling techniques like YaRN.
Key Differentiators
- Decensored Output: Significantly reduces refusals, with 8/100 refusals compared to 99/100 in the original model, making it suitable for less restricted content generation.
- Multimodal Capabilities: Features a unified vision-language foundation, enabling early fusion training on multimodal tokens for cross-generational parity with Qwen3 and Qwen3-VL models across reasoning, coding, agents, and visual understanding benchmarks.
- Efficient Architecture: Incorporates Gated Delta Networks and sparse Mixture-of-Experts for high-throughput inference with minimal latency.
- Scalable RL Generalization: Utilizes reinforcement learning scaled across million-agent environments for robust real-world adaptability.
- Global Linguistic Coverage: Expanded support for 201 languages and dialects.
Performance Highlights
While the base Qwen3.5-4B model demonstrates strong performance across various benchmarks including MMLU-Pro (79.1), C-Eval (85.1), and instruction following tasks, the 'heretic' version's primary distinction lies in its reduced refusal rate. It also shows competitive scores in vision-language tasks such as MMMU (77.6) and MathVision (74.6).
Recommended Use Cases
- Applications requiring less restrictive content: Ideal for scenarios where the original model's refusal rates are too high.
- Multimodal tasks: Capable of handling combined text, image, and video inputs for summarization, question answering, and agentic workflows.
- Long context processing: Supports extended context lengths for complex documents and conversations.
- Agentic applications: Excels in tool calling, with recommendations for use with Qwen-Agent and Qwen Code for building AI agent applications.