Model Overview

This model, developed by Convergent Intelligence LLC, is a 1.7 billion parameter Qwen3-based causal language model. It was distilled from a larger Qwen3-30B-A3B teacher model using a novel discrepancy-informed knowledge distillation (DISC v3) methodology. Unlike standard distillation, this approach specifically targets and amplifies learning on critical reasoning pivots within STEM proofs.

Key Distillation Innovations

Discrepancy-Weighted KD: Identifies and amplifies distillation weight for "reasoning pivot" tokens where the teacher-student divergence changes sharply, ensuring the student learns structural transitions.
DG-Limit Smoothing: Stabilizes training by smoothing high-entropy (unstable) student tokens, preventing noisy gradients.
Gap Energy Monitoring: Tracks structural divergence independently of average loss, regularizing the model to prevent degradation of reasoning transitions even if overall loss improves.
Proof-Weighted Cross-Entropy: Emphasizes derivation quality by giving higher weight to tokens within the proof span, decaying from 2.5x to 1.5x during training.

Training Details

The model was trained on 6,122 STEM chain-of-thought samples from 10 domain-specific datasets (e.g., Physics, Linear Algebra, Differential Equations). It uses a 1024-token training context and a higher distillation temperature (2.0) to capture more of the teacher's uncertainty structure, which is beneficial for STEM reasoning where multiple valid derivation paths may exist.

Intended Uses

Mathematical derivations and worked solutions
Proof-style explanations in STEM fields
Physics and engineering problem-solving
Educational tutoring and STEM walkthroughs
Lightweight reasoning deployment where larger models are too expensive
Generator components in verifier-generator or retrieval-augmented reasoning systems

Limitations

The model may still produce invalid derivations, omit assumptions, or overgeneralize proof templates. Its domain balance is uneven, with stronger performance in physics and mathematics than in biology. The 1024-token context limits its ability to handle very long derivations.

Overview

Model Overview

Key Distillation Innovations

Training Details

Intended Uses

Limitations

Full Model Card (README)