reaperdoesntknow/Qwen3-0.6B-Distilled-30B-A3B

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:0.8BQuant:BF16Ctx Length:32kTool Calling:SupportedPublished:Mar 22, 2026License:apache-2.0Architecture:Transformer Open Weights Warm

reaperdoesntknow/Qwen3-0.6B-Distilled-30B-A3B is a 0.6 billion parameter Qwen3-based causal language model developed by Convergent Intelligence LLC. This model is a 50x parameter compression distilled from a Qwen3-30B-A3B-Thinking teacher, specifically optimized for structured STEM derivations and reasoning tasks. It uniquely employs a "Thinking teacher" and proof-weighted loss during distillation to transfer deeper deliberation structures, making it suitable for lightweight STEM reasoning on edge devices and educational tutoring.

Loading preview...

Overview

This model, reaperdoesntknow/Qwen3-0.6B-Distilled-30B-A3B, is a 0.6 billion parameter Qwen3-based causal language model developed by Convergent Intelligence LLC. It represents a significant 50x parameter compression, distilled from a 30 billion parameter Qwen3-30B-A3B-Thinking teacher model. The core innovation lies in its distillation methodology, which focuses on transferring complex reasoning capabilities rather than just final answers.

Key Differentiators

  • Thinking Teacher Distillation: Unlike standard distillation from an Instruct teacher, this model learns from a "Thinking" variant of Qwen3-30B-A3B. This teacher generates extended internal reasoning traces with higher-entropy softmax distributions, exposing the student to a richer landscape of derivation strategies. The 0.6B student learns the deliberation process, not just the outcome.
  • Proof-Weighted Loss: During training, tokens within the derivation region (Proof: to Final Answer:) receive an amplified loss (2.5x decaying to 1.5x). This ensures that the model's limited parameters are primarily allocated to mastering reasoning steps, rather than just reproducing boilerplate or formatting.
  • STEM Focus: Trained on 6,122 STEM chain-of-thought samples across 12 domains, the model is specifically designed to produce structured STEM derivations.

Intended Uses

  • Lightweight STEM reasoning on edge/mobile devices
  • Educational tutoring and proof drafting
  • Component in multi-model pipelines requiring a small, fast reasoner
  • IoT and embedded inference applications

Limitations

Due to its compact size, the model may struggle with multi-step proofs exceeding ~8 reasoning steps, complex multi-variable problems, or domains underrepresented in its training data. Users should always verify its outputs.