reaperdoesntknow/Qwen3-0.6B-Distilled-30B-A3B-Thinking-SFT
TEXT GENERATIONConcurrency Cost:1Model Size:0.8BQuant:BF16Ctx Length:32kPublished:Mar 22, 2026License:apache-2.0Architecture:Transformer0.0K Open Weights Warm

reaperdoesntknow/Qwen3-0.6B-Distilled-30B-A3B-Thinking-SFT is a 0.6 billion parameter Qwen3-based causal language model developed by Convergent Intelligence LLC. It was created through a two-stage process: knowledge distillation from a 30B-parameter 'Thinking' teacher model for STEM reasoning, followed by supervised fine-tuning on legal instruction data. This unique training approach, emphasizing structured derivation, results in a highly compressed model (50x) optimized for ultra-lightweight reasoning in legal and STEM domains, capable of running on mobile devices with a footprint under 500MB.

Loading preview...