SeongryongJung/qwen2.5-0.5b-ifeval-pure-kd
TEXT GENERATIONConcurrency Cost:1Model Size:0.5BQuant:BF16Ctx Length:32kPublished:Apr 16, 2026Architecture:Transformer Cold
The SeongryongJung/qwen2.5-0.5b-ifeval-pure-kd is a 0.5 billion parameter instruction-tuned language model, distilled from a Qwen2.5-1.5B-Instruct teacher. It is specifically optimized for improving instruction following capabilities through knowledge distillation. This model is designed for applications requiring efficient instruction adherence, achieving an observed local IFEval accuracy of 0.405.
Loading preview...
Qwen2.5-0.5B Instruct IFEval Pure KD Overview
This model, developed by SeongryongJung, is a compact 0.5 billion parameter language model distilled from a larger Qwen2.5-1.5B-Instruct teacher. Its primary focus is on enhancing instruction following capabilities through a knowledge distillation process, specifically utilizing the IFEvalSFTDataset.
Key Characteristics
- Distilled Architecture: Built upon the Qwen2.5 framework, it leverages knowledge transfer from a more capable teacher model.
- Instruction Following Optimization: The distillation process was specifically tuned for instruction following, using
IFEvalSFTDatasetfor training. - Efficiency: With 0.5 billion parameters, it offers a more efficient alternative for deployment where instruction adherence is critical.
- Observed Performance: Achieved an observed local IFEval accuracy of
0.4050308008.
Good For
- Applications requiring a lightweight model with strong instruction following.
- Scenarios where computational resources are limited but accurate response generation based on instructions is necessary.
- Research into knowledge distillation techniques for improving specific model capabilities like instruction adherence.