Name: Kewk/Heretical-Qwen3.5-4B API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: Kewk

Heretical-Qwen3.5-4B: A Decensored Multimodal LLM

This model, developed by Kewk, is a 4.5 billion parameter variant of the Qwen3.5 family, distinguished by its significantly reduced refusal rate of 4/100, achieved through a custom-tuned Heretic fork. The base Qwen3.5 architecture, a hybrid Gated DeltaNet + Softmax Attention model, is known for its efficiency and multimodal capabilities.

Key Capabilities & Features

Decensored Output: Achieves a refusal rate of just 4/100, a substantial reduction from the original model's 100/100.
Multimodal Understanding: Supports unified vision-language processing, handling image, video, and text inputs.
Efficient Architecture: Utilizes a hybrid Gated DeltaNet and Softmax Attention design for high-throughput inference.
Extended Context Window: Natively supports a 262,144-token context length, extensible up to 1,010,000 tokens with YaRN scaling.
Agentic Functionality: Excels in tool calling, with recommended use via Qwen-Agent and Qwen Code for terminal-based AI agent applications.
Global Linguistic Coverage: Expanded support for 201 languages and dialects.

What Makes This Model Different?

The primary differentiator is its decensored nature, offering significantly fewer content refusals compared to its base model, making it suitable for applications requiring less restrictive content generation. It also maintains the robust multimodal and long-context capabilities of the Qwen3.5 series, including strong performance in STEM, instruction following, and general agent tasks, as evidenced by various benchmarks.

Should I Use This for My Use Case?

This model is ideal for developers who require a powerful, efficient, and multimodal LLM with a high tolerance for diverse content generation and minimal refusal rates. Its strong performance across language, vision, and agentic benchmarks, combined with its extensive context window, makes it suitable for:

Applications requiring less restrictive content generation.
Multimodal tasks involving image, video, and text analysis.
Long-context understanding and generation.
Building AI agents with tool-calling capabilities.
Multilingual applications.

Overview

Heretical-Qwen3.5-4B: A Decensored Multimodal LLM

Key Capabilities & Features

What Makes This Model Different?

Should I Use This for My Use Case?

Full Model Card (README)