MassivDash/Qwen3-4B-heretic

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:4BQuant:BF16Ctx Length:32kPublished:Jan 8, 2026License:apache-2.0Architecture:Transformer0.0K Open Weights Warm

MassivDash/Qwen3-4B-heretic is a 4 billion parameter causal language model, a decensored version of Qwen/Qwen3-4B created using the Heretic v1.1.0 tool. This model uniquely supports seamless switching between a 'thinking mode' for complex reasoning tasks like math and coding, and a 'non-thinking mode' for efficient general-purpose dialogue. It excels in reasoning capabilities, human preference alignment, and agent functionalities, supporting over 100 languages with a native context length of 32,768 tokens, extendable to 131,072 tokens with YaRN.

Loading preview...

Overview

MassivDash/Qwen3-4B-heretic is a 4 billion parameter language model derived from Qwen/Qwen3-4B, specifically modified using the Heretic v1.1.0 tool to be a decensored version. It retains the core advancements of the Qwen3 series, focusing on enhanced reasoning, instruction-following, and agent capabilities.

Key Capabilities

  • Dynamic Thinking Modes: Uniquely allows seamless switching between a 'thinking mode' for complex logical reasoning, mathematics, and code generation, and a 'non-thinking mode' for efficient, general-purpose dialogue. This can be controlled via enable_thinking parameter or soft switches like /think and /no_think in prompts.
  • Enhanced Reasoning: Demonstrates significant improvements in mathematical problem-solving, code generation, and commonsense logical reasoning compared to previous Qwen models.
  • Superior Human Alignment: Excels in creative writing, role-playing, multi-turn dialogues, and instruction following, providing a more natural conversational experience.
  • Agentic Functionality: Offers strong tool-calling capabilities, achieving leading performance among open-source models in complex agent-based tasks, especially when integrated with Qwen-Agent.
  • Multilingual Support: Supports over 100 languages and dialects with robust multilingual instruction following and translation abilities.
  • Extended Context Length: Natively handles up to 32,768 tokens, with support for up to 131,072 tokens using the YaRN method for processing long texts.

Performance Differentiators

Compared to the original Qwen/Qwen3-4B, this 'heretic' version shows a significant reduction in refusals, dropping from 95/100 to 3/100, while maintaining a KL divergence of 0.0000, indicating minimal deviation in core model behavior beyond the decensoring aspect.

When to Use This Model

This model is particularly well-suited for applications requiring:

  • Unrestricted Content Generation: For use cases where the original model's censorship might be a hindrance.
  • Complex Problem Solving: Leverage its 'thinking mode' for tasks demanding deep logical reasoning, such as advanced math or intricate coding challenges.
  • Interactive Agents: Ideal for building sophisticated AI agents that require precise tool integration and complex task execution.
  • Multilingual Applications: Its broad language support makes it suitable for global applications requiring instruction following and translation across many languages.
  • Long Context Processing: When dealing with extensive documents or multi-turn conversations that exceed typical context window limits, utilizing its YaRN-extended context capabilities.