davidterrell1919/Qwen3-0.6B-heretic
davidterrell1919/Qwen3-0.6B-heretic is a 0.8 billion parameter causal language model, a decensored version of Qwen/Qwen3-0.6B created using Heretic v1.3.0. This model features a unique ability to seamlessly switch between thinking and non-thinking modes for complex reasoning and general dialogue, and demonstrates significantly reduced refusals compared to its original counterpart. It excels in reasoning, instruction-following, agent capabilities, and multilingual support across 100+ languages, making it suitable for diverse conversational and analytical tasks.
Loading preview...
Model Overview: davidterrell1919/Qwen3-0.6B-heretic
This model is a decensored version of the Qwen/Qwen3-0.6B, created using the Heretic v1.3.0 tool. It retains the core capabilities of the Qwen3 series while significantly reducing refusal rates, with 5 refusals per 100 queries compared to 57/100 in the original model.
Key Capabilities & Features
- Dual-Mode Operation: Uniquely supports seamless switching between a 'thinking mode' for complex logical reasoning, math, and coding, and a 'non-thinking mode' for efficient, general-purpose dialogue. This can be controlled via
enable_thinkingparameter or/thinkand/no_thinktags in prompts. - Enhanced Reasoning: Demonstrates significant improvements in mathematical problem-solving, code generation, and commonsense logical reasoning, surpassing previous Qwen models.
- Human Preference Alignment: Excels in creative writing, role-playing, multi-turn dialogues, and instruction following, providing a more natural and engaging conversational experience.
- Agentic Capabilities: Features strong integration with external tools, achieving leading performance among open-source models in complex agent-based tasks, especially when used with Qwen-Agent.
- Multilingual Support: Supports over 100 languages and dialects with robust capabilities for multilingual instruction following and translation.
- Reproducibility: The decensoring process is reproducible, with details available in the
reproduce/README.md.
Performance & Best Practices
While maintaining a 32,768 token context length, this model shows a KL divergence of 0.0054 from the original. Optimal performance is achieved by following specific sampling parameters:
- Thinking Mode: Recommended
Temperature=0.6,TopP=0.95,TopK=20,MinP=0. Avoid greedy decoding. - Non-Thinking Mode: Recommended
Temperature=0.7,TopP=0.8,TopK=20,MinP=0. - Output Length: Use 32,768 tokens for most queries; up to 38,912 tokens for highly complex problems.
- Standardized Output: Utilize prompts like "Please reason step by step, and put your final answer within \boxed{}." for math or JSON structures for multiple-choice questions to standardize responses.
Good for
- Applications requiring reduced content refusals compared to the base Qwen3 model.
- Tasks demanding complex logical reasoning, mathematical problem-solving, or code generation.
- Creative writing, role-playing, and multi-turn dialogues where human-like interaction is crucial.
- Agent-based systems needing robust tool-calling and integration capabilities.
- Multilingual applications requiring strong instruction following and translation across many languages.