Overview

This model, Qwen3-0.6B-STEM-Proof-Distilled-Thinking, is a 0.6 billion parameter Qwen3-based causal language model developed by Convergent Intelligence LLC. It is a highly compressed (50x) distillation from a 30 billion parameter Qwen3-30B-A3B-Thinking teacher model, specifically designed to excel in STEM chain-of-thought (CoT) reasoning tasks.

Key Differentiators

Thinking Teacher Distillation: Unlike standard distillation from 'Instruct' models, this student model learns from a 'Thinking' variant teacher. This teacher generates extended internal reasoning with higher-entropy softmax distributions, exposing the 0.6B student to a richer landscape of derivation strategies and teaching it deliberation, not just answers.
Proof-Weighted Loss: During training, tokens within the Proof: to Final Answer: region receive amplified loss (2.5x decaying to 1.5x). This ensures that the model's limited parameters are primarily allocated to understanding and reproducing reasoning steps, rather than just formatting or boilerplate.
STEM CoT Dataset: Trained on 6,122 STEM CoT samples across 12 domains, focusing its capabilities on scientific and mathematical problem-solving.

Intended Uses

Lightweight STEM reasoning on edge or mobile devices.
Educational tutoring and proof drafting.
Component in multi-model pipelines requiring a small, fast reasoner.
IoT and embedded inference applications.

Limitations

Due to its compact size, the model may struggle with multi-step proofs exceeding ~8 reasoning steps, complex multi-variable problems, or domains underrepresented in its training data. Users should always verify its outputs.

Overview

Overview

Key Differentiators

Intended Uses

Limitations

Full Model Card (README)