reaperdoesntknow/Qwen3-1.7B-Distilled-30B-A3B
reaperdoesntknow/Qwen3-1.7B-Distilled-30B-A3B is a 1.7 billion parameter causal language model from Convergent Intelligence LLC, distilled from a Qwen3-30B-A3B teacher. It is specifically optimized for STEM chain-of-thought reasoning and proof generation, utilizing discrepancy-informed knowledge distillation to emphasize structural transitions and reasoning pivots. This model excels at mathematical derivations, physics problem-solving, and educational tutoring by focusing on the internal structure of reasoning.
Loading preview...
Model Overview
This model, developed by Convergent Intelligence LLC, is a 1.7 billion parameter Qwen3-based causal language model. It was distilled from a larger Qwen3-30B-A3B teacher model using a novel discrepancy-informed knowledge distillation (DISC v3) methodology. Unlike standard distillation, this approach specifically targets and amplifies learning on critical reasoning pivots within STEM proofs.
Key Distillation Innovations
- Discrepancy-Weighted KD: Identifies and amplifies distillation weight for "reasoning pivot" tokens where the teacher-student divergence changes sharply, ensuring the student learns structural transitions.
- DG-Limit Smoothing: Stabilizes training by smoothing high-entropy (unstable) student tokens, preventing noisy gradients.
- Gap Energy Monitoring: Tracks structural divergence independently of average loss, regularizing the model to prevent degradation of reasoning transitions even if overall loss improves.
- Proof-Weighted Cross-Entropy: Emphasizes derivation quality by giving higher weight to tokens within the proof span, decaying from 2.5x to 1.5x during training.
Training Details
The model was trained on 6,122 STEM chain-of-thought samples from 10 domain-specific datasets (e.g., Physics, Linear Algebra, Differential Equations). It uses a 1024-token training context and a higher distillation temperature (2.0) to capture more of the teacher's uncertainty structure, which is beneficial for STEM reasoning where multiple valid derivation paths may exist.
Intended Uses
- Mathematical derivations and worked solutions
- Proof-style explanations in STEM fields
- Physics and engineering problem-solving
- Educational tutoring and STEM walkthroughs
- Lightweight reasoning deployment where larger models are too expensive
- Generator components in verifier-generator or retrieval-augmented reasoning systems
Limitations
The model may still produce invalid derivations, omit assumptions, or overgeneralize proof templates. Its domain balance is uneven, with stronger performance in physics and mathematics than in biology. The 1024-token context limits its ability to handle very long derivations.