youngzhong/SOD-1.7B
youngzhong/SOD-1.7B is a 1.7 billion parameter student model developed by YoungZhong, distilled from a 4 billion parameter teacher model using Step-wise On-policy Distillation (SOD). This method specifically targets the cascading error propagation in agentic reasoning, enabling small language models to effectively integrate tools. It excels in challenging math, science, and code benchmarks, recovering 69.8% of its teacher's performance with significantly fewer parameters.
Loading preview...
SOD-1.7B: A Distilled Agentic Language Model
SOD-1.7B is a 1.7 billion parameter student model developed by YoungZhong, distilled from a 4 billion parameter teacher model using a novel technique called Step-wise On-policy Distillation (SOD). This method is specifically designed to train small language model agents with robust tool-integrated reasoning capabilities.
Key Differentiators & Capabilities
- Addresses Cascading Error Propagation: SOD introduces an adaptive step-level weighting mechanism to suppress distillation loss on drifted steps and restore supervision when the student model recovers alignment, all with negligible additional computational cost.
- High Performance in Agentic Reasoning: The model demonstrates strong performance on challenging math, science, and code benchmarks (AIME, GPQA-Diamond, LiveCodeBench-v6).
- Efficient Knowledge Transfer: It recovers 69.8% of its 4B teacher's performance with only 1.7B parameters, achieving an average of 42.98% across benchmarks, which is +18.5% over the second-best baseline (OPD).
- Minimal Computational Overhead: The divergence metric used in SOD reuses log-probabilities already computed in the forward pass, making the distillation process efficient.
Ideal Use Cases
- Small Language Model Agents: Excellent for applications requiring agentic reasoning and tool integration where model size is a constraint.
- Complex Problem Solving: Suited for tasks involving multi-step reasoning in domains like mathematics, science, and code generation.
- Resource-Constrained Environments: Provides strong performance in a compact form factor, making it suitable for deployment on devices with limited computational resources.