VINAY-UMRETHE/Qwen3-0.6B-heretic
TEXT GENERATIONConcurrency Cost:1Model Size:0.8BQuant:BF16Ctx Length:32kPublished:Feb 26, 2026License:apache-2.0Architecture:Transformer Open Weights Cold

VINAY-UMRETHE/Qwen3-0.6B-heretic is a 0.8 billion parameter causal language model, derived from Qwen/Qwen3-0.6B and decensored using the Heretic tool. This model retains Qwen3's core capabilities in reasoning, instruction-following, and multilingual support, while demonstrating a reduced refusal rate compared to its original counterpart. It is optimized for flexible deployment and offers distinct 'thinking' and 'non-thinking' modes for varied conversational and complex reasoning tasks.

Loading preview...

Model Overview

VINAY-UMRETHE/Qwen3-0.6B-heretic is a 0.8 billion parameter causal language model, a decensored version of Qwen/Qwen3-0.6B created with the Heretic v1.2.0 tool. It significantly reduces refusal rates (37/100) compared to the original model (59/100), while maintaining the core functionalities of the Qwen3 series.

Key Capabilities

  • Flexible Thinking Modes: Seamlessly switches between a 'thinking mode' for complex logical reasoning, math, and coding, and a 'non-thinking mode' for efficient, general-purpose dialogue. This is controlled via enable_thinking parameter or dynamic /think and /no_think tags in user prompts.
  • Enhanced Reasoning: Demonstrates strong capabilities in mathematics, code generation, and commonsense logical reasoning, particularly in its thinking mode.
  • Superior Human Preference Alignment: Excels in creative writing, role-playing, multi-turn dialogues, and instruction following.
  • Agent Capabilities: Features expertise in tool calling, integrating with external tools via Qwen-Agent for complex agent-based tasks.
  • Multilingual Support: Supports over 100 languages and dialects with robust multilingual instruction following and translation abilities.

Best Practices

  • For thinking mode, use Temperature=0.6, TopP=0.95, TopK=20, and MinP=0.
  • For non-thinking mode, use Temperature=0.7, TopP=0.8, TopK=20, and MinP=0.
  • Avoid greedy decoding in thinking mode to prevent performance degradation and repetitions.
  • Recommended output length is 32,768 tokens, or 38,912 for highly complex problems.