reaperdoesntknow/Qwen3-0.6B-Distilled-30B-A3B

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:0.8BQuant:BF16Ctx Length:32kPublished:Mar 22, 2026License:apache-2.0Architecture:Transformer Open Weights Warm

reaperdoesntknow/Qwen3-0.6B-Distilled-30B-A3B is a 0.6 billion parameter Qwen3-based causal language model developed by Convergent Intelligence LLC. This model is a 50x parameter compression distilled from a Qwen3-30B-A3B-Thinking teacher, specifically optimized for STEM chain-of-thought reasoning. It uniquely employs a 'thinking teacher' for richer deliberation transfer and proof-weighted loss to prioritize reasoning steps, making it suitable for lightweight STEM reasoning tasks.

Loading preview...

Overview

This model, Qwen3-0.6B-STEM-Proof-Distilled-Thinking, is a 0.6 billion parameter Qwen3-based causal language model developed by Convergent Intelligence LLC. It is a highly compressed (50x) distillation from a 30 billion parameter Qwen3-30B-A3B-Thinking teacher model, specifically designed to excel in STEM chain-of-thought (CoT) reasoning tasks.

Key Differentiators

  • Thinking Teacher Distillation: Unlike standard distillation from 'Instruct' models, this student model learns from a 'Thinking' variant teacher. This teacher generates extended internal reasoning with higher-entropy softmax distributions, exposing the 0.6B student to a richer landscape of derivation strategies and teaching it deliberation, not just answers.
  • Proof-Weighted Loss: During training, tokens within the Proof: to Final Answer: region receive amplified loss (2.5x decaying to 1.5x). This ensures that the model's limited parameters are primarily allocated to understanding and reproducing reasoning steps, rather than just formatting or boilerplate.
  • STEM CoT Dataset: Trained on 6,122 STEM CoT samples across 12 domains, focusing its capabilities on scientific and mathematical problem-solving.

Intended Uses

  • Lightweight STEM reasoning on edge or mobile devices.
  • Educational tutoring and proof drafting.
  • Component in multi-model pipelines requiring a small, fast reasoner.
  • IoT and embedded inference applications.

Limitations

Due to its compact size, the model may struggle with multi-step proofs exceeding ~8 reasoning steps, complex multi-variable problems, or domains underrepresented in its training data. Users should always verify its outputs.