reaperdoesntknow/Qwen3-0.6B-Distilled-30B-A3B-Thinking-SFT

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:0.8BQuant:BF16Ctx Length:32kPublished:Mar 22, 2026License:apache-2.0Architecture:Transformer0.0K Open Weights Warm

Qwen3-0.6B-Distilled-30B-A3B-Thinking-SFT is a 0.6 billion parameter Qwen3-based causal language model developed by Convergent Intelligence LLC. It was created through a two-stage process: knowledge distillation from a 30B-parameter 'Thinking' teacher model for structured reasoning, followed by supervised fine-tuning on legal instruction data. This model is optimized for ultra-lightweight reasoning in legal and STEM domains, designed for deployment on mobile, edge, and IoT devices with a 50x compression ratio.

Loading preview...

Model Overview

This model, developed by Convergent Intelligence LLC, is a 0.6 billion parameter Qwen3-based causal language model. It is distinguished by its unique two-stage training pipeline designed for efficient reasoning transfer and domain specialization, achieving a 50x compression from its teacher model.

Key Capabilities

  • Structured Reasoning Backbone: Distilled from a 30B-parameter 'Thinking' teacher model, which generates extended internal reasoning traces, enabling the 0.6B student to learn a richer landscape of derivation strategies.
  • Domain Specialization: Supervised fine-tuning on legal instruction data, leveraging the structural isomorphism between legal and mathematical reasoning.
  • Proof-Weighted Distillation: Utilizes a novel loss function (55% Proof-Weighted Cross-Entropy, 45% KL Divergence) to prioritize reasoning steps over answer formatting during distillation.
  • Ultra-Lightweight Deployment: Quantized versions are under 500MB, enabling deployment on mobile, edge, and IoT devices.

Good For

  • Ultra-lightweight reasoning on mobile/edge/IoT devices.
  • Legal and STEM instruction-following tasks.
  • Educational tutoring and embedded inference.
  • Component in multi-model pipelines where compact reasoning is required.