Name: wh-zhu/qwen2.5-1.5B-longcot-reasoning-HPD API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: wh-zhu

Overview of Qwen2.5-1.5B-longcot-reasoning-HPD

This model is a 1.5 billion parameter student model based on the Qwen2.5 architecture, distilled from the larger Qwen2.5-7B-Thinking teacher model. It leverages Hybrid Policy Distillation (HPD), a framework developed by wh-zhu, to improve the stability and efficiency of policy distillation for models focused on reasoning tasks. HPD integrates forward and reverse KL divergence to balance mode coverage and mode-seeking, and combines off-policy data with approximate on-policy sampling.

Key Capabilities and Distillation Method

Knowledge Distillation: Compresses a larger 7B parameter teacher model into a more efficient 1.5B parameter student model.
Reasoning Optimization: Specifically designed to retain and optimize reasoning capabilities through the HPD framework.
Hybrid Policy Distillation (HPD): A novel approach that enhances policy distillation by balancing different KL divergences and using a mix of data sampling techniques.

Benchmark Performance

While a student model, it demonstrates reasoning capabilities across various benchmarks, including AIME24, AIME25, AMC, MATH, OlympiadMath, and GPQA. For instance, it achieves 63.40 on MATH and 28.09 on GPQA, showcasing its ability to perform reasoning tasks despite its smaller size compared to the teacher model.

Good for

Efficient Reasoning Applications: Ideal for scenarios requiring strong reasoning capabilities within a smaller, more resource-efficient model footprint.
Research in Knowledge Distillation: Useful for researchers exploring advanced distillation techniques, particularly HPD, for LLMs.
Deployment on Resource-Constrained Environments: Suitable for applications where a smaller model size is critical for faster inference or reduced computational cost, without entirely sacrificing reasoning performance.

Overview

Overview of Qwen2.5-1.5B-longcot-reasoning-HPD

Key Capabilities and Distillation Method

Benchmark Performance

Good for

Full Model Card (README)