Shreyansh327/Qwen3-0.6B-Reasoning-Opus

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:0.8BQuant:BF16Ctx Length:32kPublished:Feb 28, 2026License:apache-2.0Architecture:Transformer Open Weights Warm

Shreyansh327/Qwen3-0.6B-Reasoning-Opus is a 0.6 billion parameter causal language model, fine-tuned by Shreyansh Pathak using QLoRA on Qwen3-0.6B. This model is specifically optimized for multi-step reasoning tasks, demonstrating a 6.0% absolute gain on GSM8K accuracy. It is primarily intended for research into the "Alignment Tax" and catastrophic forgetting when training small models exclusively on reasoning traces.

Loading preview...

What the fuck is this model about?

This model, Shreyansh327/Qwen3-0.6B-Reasoning-Opus, is a 0.6 billion parameter causal language model developed by Shreyansh Pathak. It's a fine-tuned version of Qwen3-0.6B, specifically optimized for multi-step reasoning using QLoRA on a dataset of reasoning traces distilled from Claude 4.6 Opus. The primary goal of its creation was to study the "Alignment Tax" – how training exclusively on reasoning data impacts a small model's pre-trained factual knowledge.

What makes THIS different from all the other models?

Key Differentiators:

  • Reasoning Optimization: It shows a notable +6.0% absolute gain in GSM8K accuracy (from 26.0% to 32.0%) compared to its base model, demonstrating improved multi-step reasoning capabilities.
  • Research Focus on "Alignment Tax": This model is a direct experiment to observe catastrophic forgetting. Training exclusively on reasoning data led to a massive 24.31% absolute loss in factual knowledge on the ARC-Challenge benchmark.
  • Behavioral Cloning Effects: It successfully learned the structure of reasoning (e.g., using <think> tags) but is prone to mode collapse, filling reasoning traces with overconfident, factually incorrect statements, and degenerate loops without a repetition penalty.

Should I use this for my use case?

Use Case Recommendations:

  • Research: Highly recommended for researchers studying catastrophic forgetting, the "Alignment Tax," and the effects of pure-SFT reasoning distillation on small language models.
  • Experimentation: Useful for understanding the challenges of inducing "System 2" thinking in sub-1B parameter models.

Not Recommended For:

  • Production Applications: Due to severe degradation in factual knowledge and propensity for hallucination and degenerate loops, it is not suitable for production environments requiring factual accuracy or stability.
  • General-Purpose Tasks: Its specialized training has compromised its general knowledge, making it less effective for broad applications compared to general-purpose models.