reaperdoesntknow/Qwen3-0.6B-Distilled-30B-A3B-Thinking-SFT

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:0.8BQuant:BF16Ctx Length:32kTool Calling:SupportedPublished:Mar 22, 2026License:apache-2.0Architecture:Transformer0.0K Open Weights Warm

reaperdoesntknow/Qwen3-0.6B-Distilled-30B-A3B-Thinking-SFT is a 0.6 billion parameter Qwen3-based causal language model developed by Convergent Intelligence LLC. It was created through a two-stage process: knowledge distillation from a 30B-parameter "Thinking" teacher model for structured reasoning, followed by supervised fine-tuning on legal instruction data. This model is optimized for ultra-lightweight reasoning in legal and STEM domains, designed for deployment on mobile, edge, or IoT devices with a small footprint under 500MB.

Loading preview...

Model Overview

This model, developed by Convergent Intelligence LLC, is a 0.6 billion parameter Qwen3-based causal language model. It employs a unique two-stage training pipeline: first, knowledge distillation from a 30B-parameter "Thinking" teacher model to establish a structured reasoning backbone, and second, supervised fine-tuning on legal instruction data. This approach aims to transfer deep reasoning structures efficiently to a small model, achieving a 50x compression ratio while maintaining reasoning capabilities.

Key Capabilities

  • Structured Reasoning: Distilled from a "Thinking" teacher that generates extended internal reasoning traces, enabling the 0.6B student to learn complex derivation strategies.
  • Domain Specialization: Fine-tuned on legal instruction data, leveraging the structural isomorphism between legal and mathematical reasoning.
  • Ultra-Lightweight: At 0.6B parameters and under 500MB when quantized, it's designed for resource-constrained environments.
  • Two-Stage Training: Teaches how to reason (STEM distillation) then what to reason about (legal SFT).

Good For

  • Ultra-lightweight reasoning on mobile, edge, or IoT devices.
  • Legal and STEM instruction-following tasks.
  • Educational tutoring and embedded inference applications.
  • Component in multi-model pipelines where a small, reasoning-capable model is needed.