Kewk/Heretical-Qwen3.5-4B

VISIONConcurrency Cost:1Model Size:4.5BQuant:BF16Ctx Length:32kTool Calling:SupportedPublished:Mar 4, 2026License:apache-2.0Architecture:Transformer0.0K Open Weights Cold

Kewk/Heretical-Qwen3.5-4B is a 4.5 billion parameter multimodal language model based on the Qwen3.5 architecture, featuring a hybrid Gated DeltaNet + Softmax Attention design. This model is specifically fine-tuned to significantly reduce refusal rates, achieving 4/100 refusals compared to the original model's 100/100. It supports vision-language capabilities (image, video, text) and offers a native context window of 262K tokens, making it suitable for diverse multimodal applications requiring reduced content moderation.

Loading preview...

Heretical-Qwen3.5-4B: A Decensored Multimodal LLM

This model, developed by Kewk, is a 4.5 billion parameter variant of the Qwen3.5 family, distinguished by its significantly reduced refusal rate of 4/100, achieved through a custom-tuned Heretic fork. The base Qwen3.5 architecture, a hybrid Gated DeltaNet + Softmax Attention model, is known for its efficiency and multimodal capabilities.

Key Capabilities & Features

  • Decensored Output: Achieves a refusal rate of just 4/100, a substantial reduction from the original model's 100/100.
  • Multimodal Understanding: Supports unified vision-language processing, handling image, video, and text inputs.
  • Efficient Architecture: Utilizes a hybrid Gated DeltaNet and Softmax Attention design for high-throughput inference.
  • Extended Context Window: Natively supports a 262,144-token context length, extensible up to 1,010,000 tokens with YaRN scaling.
  • Agentic Functionality: Excels in tool calling, with recommended use via Qwen-Agent and Qwen Code for terminal-based AI agent applications.
  • Global Linguistic Coverage: Expanded support for 201 languages and dialects.

What Makes This Model Different?

The primary differentiator is its decensored nature, offering significantly fewer content refusals compared to its base model, making it suitable for applications requiring less restrictive content generation. It also maintains the robust multimodal and long-context capabilities of the Qwen3.5 series, including strong performance in STEM, instruction following, and general agent tasks, as evidenced by various benchmarks.

Should I Use This for My Use Case?

This model is ideal for developers who require a powerful, efficient, and multimodal LLM with a high tolerance for diverse content generation and minimal refusal rates. Its strong performance across language, vision, and agentic benchmarks, combined with its extensive context window, makes it suitable for:

  • Applications requiring less restrictive content generation.
  • Multimodal tasks involving image, video, and text analysis.
  • Long-context understanding and generation.
  • Building AI agents with tool-calling capabilities.
  • Multilingual applications.