Name: reaperdoesntknow/Qwen3-0.6B-Distilled-30B-A3B-Thinking-SFT API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: reaperdoesntknow

Model Overview

This model, developed by Convergent Intelligence LLC, is a 0.6 billion parameter Qwen3-based causal language model. It is distinguished by its unique two-stage training pipeline designed for efficient reasoning transfer and domain specialization, achieving a 50x compression from its teacher model.

Key Capabilities

Structured Reasoning Backbone: Distilled from a 30B-parameter 'Thinking' teacher model, which generates extended internal reasoning traces, enabling the 0.6B student to learn a richer landscape of derivation strategies.
Domain Specialization: Supervised fine-tuning on legal instruction data, leveraging the structural isomorphism between legal and mathematical reasoning.
Proof-Weighted Distillation: Utilizes a novel loss function (55% Proof-Weighted Cross-Entropy, 45% KL Divergence) to prioritize reasoning steps over answer formatting during distillation.
Ultra-Lightweight Deployment: Quantized versions are under 500MB, enabling deployment on mobile, edge, and IoT devices.

Good For

Ultra-lightweight reasoning on mobile/edge/IoT devices.
Legal and STEM instruction-following tasks.
Educational tutoring and embedded inference.
Component in multi-model pipelines where compact reasoning is required.

Overview

Model Overview

Key Capabilities

Good For

Full Model Card (README)