Name: wh-zhu/Qwen2.5-7B-PSFT-RL-DAPO-90 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: wh-zhu

Model Overview

The wh-zhu/Qwen2.5-7B-PSFT-RL-DAPO-90 is a 7.6 billion parameter model that implements Hybrid Policy Distillation (HPD). Developed by Zhiwei Hong and others, HPD is a novel framework for compressing large language models (LLMs).

Key Capabilities

Efficient LLM Compression: HPD reformulates knowledge distillation (KD) as a reweighted log-likelihood objective at the token level.
Balanced Mode Coverage and Mode-Seeking: Integrates the complementary advantages of forward and reverse KL divergence to achieve a better balance in model learning.
Improved Computational Efficiency: Designed to enhance the efficiency of model training and inference.
Enhanced Performance: Demonstrates improved final performance across diverse model families and scales compared to traditional distillation methods.

Use Cases

This model is particularly suited for scenarios requiring:

Deployment of Smaller, High-Performing LLMs: Ideal for environments where computational resources are constrained but high performance is still critical.
Research in Knowledge Distillation: Provides a practical implementation of the HPD framework for further study and development in model compression techniques.

For more technical details, refer to the Hybrid Policy Distillation for LLMs paper and the associated GitHub Repository.

Overview

Model Overview

Key Capabilities

Use Cases

Full Model Card (README)