reaperdoesntknow/Qwen3-0.6B-Distilled-30B-A3B-Thinking-SFT
Qwen3-0.6B-Distilled-30B-A3B-Thinking-SFT is a 0.6 billion parameter Qwen3-based causal language model developed by Convergent Intelligence LLC. It was created through a two-stage process: knowledge distillation from a 30B-parameter 'Thinking' teacher model for structured reasoning, followed by supervised fine-tuning on legal instruction data. This model is optimized for ultra-lightweight reasoning in legal and STEM domains, designed for deployment on mobile, edge, and IoT devices with a 50x compression ratio.
Loading preview...
Model Overview
This model, developed by Convergent Intelligence LLC, is a 0.6 billion parameter Qwen3-based causal language model. It is distinguished by its unique two-stage training pipeline designed for efficient reasoning transfer and domain specialization, achieving a 50x compression from its teacher model.
Key Capabilities
- Structured Reasoning Backbone: Distilled from a 30B-parameter 'Thinking' teacher model, which generates extended internal reasoning traces, enabling the 0.6B student to learn a richer landscape of derivation strategies.
- Domain Specialization: Supervised fine-tuning on legal instruction data, leveraging the structural isomorphism between legal and mathematical reasoning.
- Proof-Weighted Distillation: Utilizes a novel loss function (55% Proof-Weighted Cross-Entropy, 45% KL Divergence) to prioritize reasoning steps over answer formatting during distillation.
- Ultra-Lightweight Deployment: Quantized versions are under 500MB, enabling deployment on mobile, edge, and IoT devices.
Good For
- Ultra-lightweight reasoning on mobile/edge/IoT devices.
- Legal and STEM instruction-following tasks.
- Educational tutoring and embedded inference.
- Component in multi-model pipelines where compact reasoning is required.