MassivDash/Qwen3-4B-heretic
MassivDash/Qwen3-4B-heretic is a 4 billion parameter causal language model, a decensored version of Qwen/Qwen3-4B created using the Heretic v1.1.0 tool. This model uniquely supports seamless switching between a 'thinking mode' for complex reasoning tasks like math and coding, and a 'non-thinking mode' for efficient general-purpose dialogue. It excels in reasoning capabilities, human preference alignment, and agent functionalities, supporting over 100 languages with a native context length of 32,768 tokens, extendable to 131,072 tokens with YaRN.
Loading preview...
Overview
MassivDash/Qwen3-4B-heretic is a 4 billion parameter language model derived from Qwen/Qwen3-4B, specifically modified using the Heretic v1.1.0 tool to be a decensored version. It retains the core advancements of the Qwen3 series, focusing on enhanced reasoning, instruction-following, and agent capabilities.
Key Capabilities
- Dynamic Thinking Modes: Uniquely allows seamless switching between a 'thinking mode' for complex logical reasoning, mathematics, and code generation, and a 'non-thinking mode' for efficient, general-purpose dialogue. This can be controlled via
enable_thinkingparameter or soft switches like/thinkand/no_thinkin prompts. - Enhanced Reasoning: Demonstrates significant improvements in mathematical problem-solving, code generation, and commonsense logical reasoning compared to previous Qwen models.
- Superior Human Alignment: Excels in creative writing, role-playing, multi-turn dialogues, and instruction following, providing a more natural conversational experience.
- Agentic Functionality: Offers strong tool-calling capabilities, achieving leading performance among open-source models in complex agent-based tasks, especially when integrated with Qwen-Agent.
- Multilingual Support: Supports over 100 languages and dialects with robust multilingual instruction following and translation abilities.
- Extended Context Length: Natively handles up to 32,768 tokens, with support for up to 131,072 tokens using the YaRN method for processing long texts.
Performance Differentiators
Compared to the original Qwen/Qwen3-4B, this 'heretic' version shows a significant reduction in refusals, dropping from 95/100 to 3/100, while maintaining a KL divergence of 0.0000, indicating minimal deviation in core model behavior beyond the decensoring aspect.
When to Use This Model
This model is particularly well-suited for applications requiring:
- Unrestricted Content Generation: For use cases where the original model's censorship might be a hindrance.
- Complex Problem Solving: Leverage its 'thinking mode' for tasks demanding deep logical reasoning, such as advanced math or intricate coding challenges.
- Interactive Agents: Ideal for building sophisticated AI agents that require precise tool integration and complex task execution.
- Multilingual Applications: Its broad language support makes it suitable for global applications requiring instruction following and translation across many languages.
- Long Context Processing: When dealing with extensive documents or multi-turn conversations that exceed typical context window limits, utilizing its YaRN-extended context capabilities.