VINAY-UMRETHE/Qwen3-0.6B-heretic-Base2
VINAY-UMRETHE/Qwen3-0.6B-heretic-Base2 is a 0.8 billion parameter causal language model, a decensored version of Qwen/Qwen3-0.6B created using Heretic v1.3.0. This model features a unique ability to seamlessly switch between a 'thinking mode' for complex reasoning, math, and coding, and a 'non-thinking mode' for efficient general dialogue, enhancing performance across diverse scenarios. It demonstrates significantly reduced refusals compared to its original counterpart, making it suitable for applications requiring less restrictive content generation.
Loading preview...
VINAY-UMRETHE/Qwen3-0.6B-heretic-Base2 Overview
This model is a decensored version of the Qwen/Qwen3-0.6B, developed using the Heretic v1.3.0 tool. It retains the core architecture of the Qwen3 series, featuring 0.6 billion parameters (0.44B non-embedding) and a context length of 32,768 tokens.
Key Differentiators & Capabilities
- Decensored Output: Significantly reduces refusals, with only 6 refusals out of 100 compared to 55/100 in the original Qwen/Qwen3-0.6B, making it suitable for use cases requiring less content filtering.
- Dynamic Thinking Modes: Uniquely supports seamless switching between a 'thinking mode' for complex logical reasoning, mathematics, and code generation, and a 'non-thinking mode' for efficient, general-purpose dialogue. This can be controlled via
enable_thinkingparameter or dynamic/thinkand/no_thinktags in user prompts. - Enhanced Reasoning: In thinking mode, it shows significant improvements in mathematical, code generation, and commonsense logical reasoning tasks.
- Agent Capabilities: Excels in tool-calling and integration with external tools, performing well in complex agent-based tasks, especially when used with Qwen-Agent.
- Multilingual Support: Supports over 100 languages and dialects with strong multilingual instruction following and translation capabilities.
Performance & Best Practices
While maintaining a low KL divergence of 0.0139 from the original model, its primary distinction is the reduced refusal rate. For optimal performance, specific sampling parameters are recommended for each mode: Temperature=0.6, TopP=0.95, TopK=20 for thinking mode, and Temperature=0.7, TopP=0.8, TopK=20 for non-thinking mode. It is advised to use an adequate output length of 32,768 tokens for most queries.