reaperdoesntknow/Qwen3-1.7B-Distilled-30B-A3B

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:2BQuant:BF16Ctx Length:32kPublished:Mar 22, 2026License:apache-2.0Architecture:Transformer0.0K Open Weights Warm

The reaperdoesntknow/Qwen3-1.7B-Distilled-30B-A3B model is a 1.7 billion parameter causal language model, part of the Qwen3 architecture, developed by Convergent Intelligence LLC: Research Division. It is specifically distilled from a Qwen3-30B-A3B teacher model using a novel discrepancy-informed knowledge distillation method. This model excels at generating rigorous STEM derivations, mathematical proofs, and physics/engineering problem-solving by emphasizing reasoning structure over surface-level patterns.

Loading preview...

Model Overview

This model, developed by Convergent Intelligence LLC: Research Division, is a 1.7 billion parameter Qwen3-based causal language model. It was distilled from a Qwen3-30B-A3B teacher using a unique discrepancy-informed knowledge distillation (DISC v3) methodology, specifically designed to enhance reasoning capabilities in STEM contexts.

Key Differentiators

Unlike standard distillation, this model employs three core discrepancy-informed operators:

  • Discrepancy-Weighted KD: Identifies and amplifies learning on "reasoning pivot tokens" where the derivation changes technique or introduces key concepts, using token-level KL divergence.
  • DG-Limit Smoothing: Stabilizes training by smoothing high-entropy (unstable) student tokens, replacing logits with a neighborhood average before KD computation.
  • Gap Energy Monitoring: Tracks structural divergence independent of average loss, regularizing the model to prevent degradation of reasoning transitions even if overall loss improves.

Additionally, it uses proof-weighted cross-entropy, giving higher importance to tokens within the derivation span (from Proof: to Final Answer:), with emphasis decaying from 2.5x to 1.5x during training. The model was trained on 6,122 STEM chain-of-thought samples from 10 domain-specific datasets.

Intended Uses

  • Mathematical derivations and worked solutions
  • Proof-style explanations
  • Physics and engineering problem-solving
  • Educational tutoring and STEM walkthroughs
  • Lightweight reasoning deployment where larger models are too expensive
  • Generator components in verifier-generator or retrieval-augmented reasoning systems