reaperdoesntknow/Qwen3-0.6B-Distilled-30B-A3B-Thinking-SFT
reaperdoesntknow/Qwen3-0.6B-Distilled-30B-A3B-Thinking-SFT is a 0.6 billion parameter Qwen3-based causal language model developed by Convergent Intelligence LLC. It was created through a two-stage process: knowledge distillation from a 30B-parameter "Thinking" teacher model for structured reasoning, followed by supervised fine-tuning on legal instruction data. This model is optimized for ultra-lightweight reasoning in legal and STEM domains, designed for deployment on mobile, edge, or IoT devices with a small footprint under 500MB.
Loading preview...
Model Overview
This model, developed by Convergent Intelligence LLC, is a 0.6 billion parameter Qwen3-based causal language model. It employs a unique two-stage training pipeline: first, knowledge distillation from a 30B-parameter "Thinking" teacher model to establish a structured reasoning backbone, and second, supervised fine-tuning on legal instruction data. This approach aims to transfer deep reasoning structures efficiently to a small model, achieving a 50x compression ratio while maintaining reasoning capabilities.
Key Capabilities
- Structured Reasoning: Distilled from a "Thinking" teacher that generates extended internal reasoning traces, enabling the 0.6B student to learn complex derivation strategies.
- Domain Specialization: Fine-tuned on legal instruction data, leveraging the structural isomorphism between legal and mathematical reasoning.
- Ultra-Lightweight: At 0.6B parameters and under 500MB when quantized, it's designed for resource-constrained environments.
- Two-Stage Training: Teaches how to reason (STEM distillation) then what to reason about (legal SFT).
Good For
- Ultra-lightweight reasoning on mobile, edge, or IoT devices.
- Legal and STEM instruction-following tasks.
- Educational tutoring and embedded inference applications.
- Component in multi-model pipelines where a small, reasoning-capable model is needed.