VINAY-UMRETHE/Qwen3-0.6B-heretic-OG
TEXT GENERATIONConcurrency Cost:1Model Size:0.8BQuant:BF16Ctx Length:32kPublished:Mar 7, 2026License:apache-2.0Architecture:Transformer Open Weights Cold

VINAY-UMRETHE/Qwen3-0.6B-heretic-OG is a 0.8 billion parameter causal language model, based on the Qwen3 architecture, with a 32,768 token context length. This model is a decensored version of Qwen/Qwen3-0.6B, created using the Heretic v1.2.0 tool, significantly reducing refusals compared to the original. It retains Qwen3's unique ability to seamlessly switch between 'thinking' and 'non-thinking' modes for complex reasoning or efficient general dialogue. This model is optimized for enhanced reasoning, instruction-following, and agent capabilities, making it suitable for applications requiring less restrictive content generation.

Loading preview...

VINAY-UMRETHE/Qwen3-0.6B-heretic-OG: Decensored Qwen3 Model

This model is a decensored version of the Qwen/Qwen3-0.6B, created using the Heretic v1.2.0 tool. It significantly reduces content refusals, demonstrating 31 refusals out of 100 compared to the original model's 59/100. Based on the Qwen3 architecture, this 0.8 billion parameter causal language model features a 32,768 token context length.

Key Capabilities & Features

  • Decensored Output: Offers less restrictive content generation compared to the base Qwen3 model.
  • Dual Thinking Modes: Uniquely supports seamless switching between a 'thinking mode' for complex logical reasoning, math, and coding, and a 'non-thinking mode' for efficient, general-purpose dialogue. This can be controlled via enable_thinking parameter or /think and /no_think tags in prompts.
  • Enhanced Reasoning: Retains Qwen3's advancements in mathematics, code generation, and commonsense logical reasoning.
  • Superior Human Preference Alignment: Excels in creative writing, role-playing, multi-turn dialogues, and instruction following.
  • Agent Capabilities: Demonstrates strong tool-calling abilities, integrating with external tools in both thinking and non-thinking modes.
  • Multilingual Support: Supports over 100 languages and dialects with robust multilingual instruction following and translation.

Usage Recommendations

  • Sampling Parameters: Specific Temperature, TopP, TopK, and MinP settings are recommended for optimal performance in both thinking and non-thinking modes. Greedy decoding is discouraged for thinking mode.
  • Output Length: Recommend 32,768 tokens for most queries, up to 38,912 for complex problems.
  • Standardized Output: Use specific prompts for math problems (e.g., "Please reason step by step, and put your final answer within \boxed{}") and multiple-choice questions (e.g., JSON structure for answers).
  • Agentic Use: Recommended with Qwen-Agent for best tool-calling performance.