wh-zhu/Qwen2.5-7B-PSFT-RL-DAPO-90
The wh-zhu/Qwen2.5-7B-PSFT-RL-DAPO-90 is a 7.6 billion parameter language model developed by Zhiwei Hong and others, based on the Qwen2.5 architecture. This model utilizes Hybrid Policy Distillation (HPD), a framework that reformulates knowledge distillation to balance mode coverage and mode-seeking. It is designed for efficient compression of large language models, demonstrating improved computational efficiency and performance across various model families and scales.
Loading preview...
Model Overview
The wh-zhu/Qwen2.5-7B-PSFT-RL-DAPO-90 is a 7.6 billion parameter model that implements Hybrid Policy Distillation (HPD). Developed by Zhiwei Hong and others, HPD is a novel framework for compressing large language models (LLMs).
Key Capabilities
- Efficient LLM Compression: HPD reformulates knowledge distillation (KD) as a reweighted log-likelihood objective at the token level.
- Balanced Mode Coverage and Mode-Seeking: Integrates the complementary advantages of forward and reverse KL divergence to achieve a better balance in model learning.
- Improved Computational Efficiency: Designed to enhance the efficiency of model training and inference.
- Enhanced Performance: Demonstrates improved final performance across diverse model families and scales compared to traditional distillation methods.
Use Cases
This model is particularly suited for scenarios requiring:
- Deployment of Smaller, High-Performing LLMs: Ideal for environments where computational resources are constrained but high performance is still critical.
- Research in Knowledge Distillation: Provides a practical implementation of the HPD framework for further study and development in model compression techniques.
For more technical details, refer to the Hybrid Policy Distillation for LLMs paper and the associated GitHub Repository.