reaperdoesntknow/Qwen3-0.6B-Distilled-30B-A3B
reaperdoesntknow/Qwen3-0.6B-Distilled-30B-A3B is a 0.6 billion parameter Qwen3-based causal language model developed by Convergent Intelligence LLC. This model is a 50x parameter compression distilled from a Qwen3-30B-A3B-Thinking teacher, specifically optimized for STEM chain-of-thought reasoning. It uniquely employs a 'thinking teacher' for richer deliberation transfer and proof-weighted loss to prioritize reasoning steps, making it suitable for lightweight STEM reasoning tasks.
Loading preview...
Overview
This model, Qwen3-0.6B-STEM-Proof-Distilled-Thinking, is a 0.6 billion parameter Qwen3-based causal language model developed by Convergent Intelligence LLC. It is a highly compressed (50x) distillation from a 30 billion parameter Qwen3-30B-A3B-Thinking teacher model, specifically designed to excel in STEM chain-of-thought (CoT) reasoning tasks.
Key Differentiators
- Thinking Teacher Distillation: Unlike standard distillation from 'Instruct' models, this student model learns from a 'Thinking' variant teacher. This teacher generates extended internal reasoning with higher-entropy softmax distributions, exposing the 0.6B student to a richer landscape of derivation strategies and teaching it deliberation, not just answers.
- Proof-Weighted Loss: During training, tokens within the
Proof:toFinal Answer:region receive amplified loss (2.5x decaying to 1.5x). This ensures that the model's limited parameters are primarily allocated to understanding and reproducing reasoning steps, rather than just formatting or boilerplate. - STEM CoT Dataset: Trained on 6,122 STEM CoT samples across 12 domains, focusing its capabilities on scientific and mathematical problem-solving.
Intended Uses
- Lightweight STEM reasoning on edge or mobile devices.
- Educational tutoring and proof drafting.
- Component in multi-model pipelines requiring a small, fast reasoner.
- IoT and embedded inference applications.
Limitations
Due to its compact size, the model may struggle with multi-step proofs exceeding ~8 reasoning steps, complex multi-variable problems, or domains underrepresented in its training data. Users should always verify its outputs.