reaperdoesntknow/Qwen3-0.6B-Distilled-30B-A3B-Thinking-SFT

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:0.8BQuant:BF16Ctx Length:32kPublished:Mar 22, 2026License:apache-2.0Architecture:Transformer0.0K Open Weights Warm

Qwen3-0.6B-Distilled-30B-A3B-Thinking-SFT is a 0.6 billion parameter Qwen3-based causal language model developed by Convergent Intelligence LLC: Research Division. It was created through a two-stage process: knowledge distillation from a 30B-parameter 'Thinking' teacher model for structured reasoning, followed by supervised fine-tuning on legal instruction data. This model is highly compressed (50x) and optimized for ultra-lightweight reasoning in legal and STEM domains, designed to run on edge devices like mobile phones with a footprint under 500MB.

Loading preview...

Model Overview

Qwen3-0.6B-Distilled-30B-A3B-Thinking-SFT is a compact 0.6 billion parameter model from the Qwen3 family, developed by Convergent Intelligence LLC: Research Division. Its unique training methodology involves a two-stage process: initial knowledge distillation from a 30 billion parameter 'Thinking' teacher model to instill a robust reasoning backbone, followed by supervised fine-tuning on legal instruction data. This approach prioritizes teaching the model how to reason before what to reason about, leveraging the 'Thinking' teacher's extended deliberation traces for deeper structural transfer.

Key Capabilities

  • Ultra-lightweight Reasoning: Achieves a 50x compression ratio, resulting in a model under 500MB, capable of running on mobile devices.
  • Two-Stage Training: Combines STEM chain-of-thought distillation (from a Qwen3-30B-A3B-Thinking teacher) with legal domain supervised fine-tuning, enabling structured reasoning in both areas.
  • Proof-Weighted Distillation: Utilizes a novel loss function with 2.5x weight on derivation tokens during distillation, forcing the model to focus its limited capacity on reasoning steps.
  • Dual Prompt Formats: Supports both STEM derivation (Problem/Proof/Final Answer) and general instruction-following (Instruction/Response) formats.

Good For

  • Ultra-lightweight reasoning on mobile, edge, and IoT devices.
  • Legal and STEM instruction-following tasks.
  • Educational tutoring and embedded inference applications.
  • Component integration in multi-model pipelines where compact reasoning is crucial.

Limitations

Due to its 0.6B parameter size, the model has inherent capacity constraints. It may exhibit reasoning errors that larger models would avoid, particularly in multi-step derivations exceeding ~8 steps. While it covers general legal concepts, it lacks the nuance of larger models, and performance is weakest on underrepresented STEM domains like molecular biology and physiology. It is not suitable for formal proof verification, actual legal counsel, safety-critical analysis, or long-context tasks beyond its 1024-token training context.