Overview
This model, Qwen3-0.6B-STEM-Proof-Distilled-Thinking, is a 0.6 billion parameter Qwen3-based causal language model developed by Convergent Intelligence LLC. It represents a significant 50x parameter compression from its 30B teacher, Qwen3-30B-A3B-Thinking, and is specifically designed for STEM reasoning tasks. The model was trained on 6,122 STEM chain-of-thought samples, focusing on transferring deep deliberation structures from the larger 'Thinking' variant teacher.
Key Differentiators
- Thinking Teacher Distillation: Unlike standard distillation from 'Instruct' models, this student learns from a 'Thinking' teacher (Qwen3-30B-A3B-Thinking) which generates extended internal reasoning. This process, at a distillation temperature of T=2.0, exposes the 0.6B student to a richer landscape of derivation strategies, teaching it the deliberation process, not just the final answer.
- Proof-Weighted Loss: The training loss function amplifies errors within the derivation region (
Proof: to Final Answer:), with tokens receiving 2.5x amplified loss decaying to 1.5x. This ensures that the model's limited parameters are primarily allocated to reasoning capability rather than mere boilerplate reproduction.
Training Details
The model was trained using a combined loss function of 55% Proof-Weighted Cross-Entropy and 45% Knowledge Distillation KL Divergence. It utilized a dataset of 6,122 STEM CoT samples across 12 domains, with a training context length of 1024 tokens.
Intended Uses
- Lightweight STEM reasoning on edge/mobile devices
- Educational tutoring and proof drafting
- Component in multi-model pipelines requiring a small, fast reasoner
- IoT and embedded inference
Limitations
Due to its 0.6B parameter size, the model has capacity constraints. It may struggle with multi-step proofs requiring more than ~8 reasoning steps, complex multi-variable problems, or domains underrepresented in its training data. Users should always verify its outputs.