youngzhong/SOD-0.6B

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:0.8BQuant:BF16Ctx Length:32kPublished:May 12, 2026License:apache-2.0Architecture:Transformer Open Weights Warm

youngzhong/SOD-0.6B is a 0.6 billion parameter student language model developed by youngzhong, distilled from a 4B teacher model using Step-wise On-policy Distillation (SOD). This method is specifically designed to train small language model agents with enhanced tool-integrated reasoning capabilities, addressing cascading error propagation in on-policy distillation. Built upon Qwen3-0.6B, it excels in challenging math, science, and code benchmarks, demonstrating strong performance for agentic tasks.

Loading preview...

Overview

youngzhong/SOD-0.6B is a 0.6 billion parameter student model that has been distilled from a 4 billion parameter teacher model using a novel technique called Step-wise On-policy Distillation (SOD). This method is specifically engineered to train small language model agents, focusing on improving their tool-integrated reasoning abilities. SOD tackles the common issue of cascading error propagation in on-policy distillation by employing an adaptive step-level weighting mechanism. This mechanism effectively suppresses distillation loss on drifted steps and restores supervision when the student model realigns, all with minimal additional computational overhead.

Key Capabilities & Features

  • Agentic Reasoning: Optimized for tasks requiring tool-integrated reasoning, making it suitable for agent-based applications.
  • Efficient Distillation: Utilizes SOD to create a highly capable small model (0.6B parameters) from a larger teacher (4B parameters) without significant computational cost.
  • Error Mitigation: The SOD method specifically addresses and reduces cascading error propagation during on-policy distillation.
  • Strong Performance: Achieves notable results on challenging benchmarks, including AIME, GPQA-Diamond, and LiveCodeBench-v6.

Performance Highlights

This model demonstrates significant performance gains over other 0.6B baselines, particularly in complex reasoning tasks:

  • Achieves 26.13% on AIME 2025 (average@32).
  • Shows an average improvement of +20.86% over the second-best baseline (OPD) across evaluated benchmarks.

Good For

  • Developing small, efficient language model agents.
  • Applications requiring tool-integrated reasoning where model size is a constraint.
  • Tasks involving complex math, science, and code problem-solving.