alibaba-pai/DistilQwen2.5-0.5B-Instruct

Warm
Public
0.5B
BF16
131072
Hugging Face
Overview

Overview

DistilQwen2.5-0.5B-Instruct is a compact 0.5 billion parameter language model developed by alibaba-pai, specifically designed to distill the capabilities of larger, more powerful LLMs into a smaller, more efficient form factor. It is based on the Qwen2.5-0.5B-Instruct architecture and supports an extensive context length of 131072 tokens.

Distillation Methodology

This model employs a sophisticated two-stage distillation process:

  • Data Curation: Utilizes a diverse range of open-source datasets (e.g., Magpie, Openhermes, Mammoth 2) and proprietary synthetic data. A unique difficulty scoring system, using an LLM-as-a-Judge paradigm, evaluates instruction responses to identify and prioritize challenging, informative examples for training.
  • Knowledge Distillation: Combines both black-box (highest probability token output) and white-box (teacher model logits distribution) distillation. White-box distillation is emphasized for its ability to transfer richer information and enhance the student model's performance by mimicking the teacher's output distribution.

Key Characteristics

  • Efficient Performance: Achieves high performance in a small package due to advanced distillation techniques.
  • Bilingual Support: Primarily trained on instruction data in Chinese and English.
  • Extended Context: Features a substantial context window of 131072 tokens.

Use Cases

DistilQwen2.5-0.5B-Instruct is suitable for applications requiring a lightweight yet capable instruction-following model, particularly in scenarios where computational resources are limited or low-latency inference is critical. Its bilingual training makes it versatile for both Chinese and English language tasks.

For more technical details, refer to the paper: DistilQwen2.5: Industrial Practices of Training Distilled Open Lightweight Language Models.