davidterrell1919/Qwen3-0.6B-heretic

TEXT GENERATIONConcurrency Cost:1Model Size:0.8BQuant:BF16Ctx Length:32kPublished:May 6, 2026License:apache-2.0Architecture:Transformer Open Weights Cold

davidterrell1919/Qwen3-0.6B-heretic is a 0.8 billion parameter causal language model, a decensored version of Qwen/Qwen3-0.6B created using Heretic v1.3.0. This model features a unique ability to seamlessly switch between thinking and non-thinking modes for complex reasoning and general dialogue, and demonstrates significantly reduced refusals compared to its original counterpart. It excels in reasoning, instruction-following, agent capabilities, and multilingual support across 100+ languages, making it suitable for diverse conversational and analytical tasks.

Loading preview...

Model Overview: davidterrell1919/Qwen3-0.6B-heretic

This model is a decensored version of the Qwen/Qwen3-0.6B, created using the Heretic v1.3.0 tool. It retains the core capabilities of the Qwen3 series while significantly reducing refusal rates, with 5 refusals per 100 queries compared to 57/100 in the original model.

Key Capabilities & Features

  • Dual-Mode Operation: Uniquely supports seamless switching between a 'thinking mode' for complex logical reasoning, math, and coding, and a 'non-thinking mode' for efficient, general-purpose dialogue. This can be controlled via enable_thinking parameter or /think and /no_think tags in prompts.
  • Enhanced Reasoning: Demonstrates significant improvements in mathematical problem-solving, code generation, and commonsense logical reasoning, surpassing previous Qwen models.
  • Human Preference Alignment: Excels in creative writing, role-playing, multi-turn dialogues, and instruction following, providing a more natural and engaging conversational experience.
  • Agentic Capabilities: Features strong integration with external tools, achieving leading performance among open-source models in complex agent-based tasks, especially when used with Qwen-Agent.
  • Multilingual Support: Supports over 100 languages and dialects with robust capabilities for multilingual instruction following and translation.
  • Reproducibility: The decensoring process is reproducible, with details available in the reproduce/README.md.

Performance & Best Practices

While maintaining a 32,768 token context length, this model shows a KL divergence of 0.0054 from the original. Optimal performance is achieved by following specific sampling parameters:

  • Thinking Mode: Recommended Temperature=0.6, TopP=0.95, TopK=20, MinP=0. Avoid greedy decoding.
  • Non-Thinking Mode: Recommended Temperature=0.7, TopP=0.8, TopK=20, MinP=0.
  • Output Length: Use 32,768 tokens for most queries; up to 38,912 tokens for highly complex problems.
  • Standardized Output: Utilize prompts like "Please reason step by step, and put your final answer within \boxed{}." for math or JSON structures for multiple-choice questions to standardize responses.

Good for

  • Applications requiring reduced content refusals compared to the base Qwen3 model.
  • Tasks demanding complex logical reasoning, mathematical problem-solving, or code generation.
  • Creative writing, role-playing, and multi-turn dialogues where human-like interaction is crucial.
  • Agent-based systems needing robust tool-calling and integration capabilities.
  • Multilingual applications requiring strong instruction following and translation across many languages.