wh-zhu/qwen2.5-1.5B-longcot-reasoning-HPD

TEXT GENERATIONConcurrency Cost:1Model Size:1.5BQuant:BF16Ctx Length:32kPublished:Apr 20, 2026License:apache-2.0Architecture:Transformer Open Weights Cold

The wh-zhu/qwen2.5-1.5B-longcot-reasoning-HPD model is a 1.5 billion parameter Qwen2.5-based student model, developed by wh-zhu, distilled from the larger Qwen2.5-7B-Thinking teacher model. It utilizes Hybrid Policy Distillation (HPD) to enhance stability and efficiency for reasoning-oriented tasks. This model is specifically designed for knowledge compression in large language models, focusing on maintaining reasoning capabilities in a smaller footprint. With a context length of 32768 tokens, it aims to provide efficient reasoning performance.

Loading preview...

Overview of Qwen2.5-1.5B-longcot-reasoning-HPD

This model is a 1.5 billion parameter student model based on the Qwen2.5 architecture, distilled from the larger Qwen2.5-7B-Thinking teacher model. It leverages Hybrid Policy Distillation (HPD), a framework developed by wh-zhu, to improve the stability and efficiency of policy distillation for models focused on reasoning tasks. HPD integrates forward and reverse KL divergence to balance mode coverage and mode-seeking, and combines off-policy data with approximate on-policy sampling.

Key Capabilities and Distillation Method

  • Knowledge Distillation: Compresses a larger 7B parameter teacher model into a more efficient 1.5B parameter student model.
  • Reasoning Optimization: Specifically designed to retain and optimize reasoning capabilities through the HPD framework.
  • Hybrid Policy Distillation (HPD): A novel approach that enhances policy distillation by balancing different KL divergences and using a mix of data sampling techniques.

Benchmark Performance

While a student model, it demonstrates reasoning capabilities across various benchmarks, including AIME24, AIME25, AMC, MATH, OlympiadMath, and GPQA. For instance, it achieves 63.40 on MATH and 28.09 on GPQA, showcasing its ability to perform reasoning tasks despite its smaller size compared to the teacher model.

Good for

  • Efficient Reasoning Applications: Ideal for scenarios requiring strong reasoning capabilities within a smaller, more resource-efficient model footprint.
  • Research in Knowledge Distillation: Useful for researchers exploring advanced distillation techniques, particularly HPD, for LLMs.
  • Deployment on Resource-Constrained Environments: Suitable for applications where a smaller model size is critical for faster inference or reduced computational cost, without entirely sacrificing reasoning performance.