SeongryongJung/qwen2.5-0.5b-ifeval-mixed-kd-alpha05
TEXT GENERATIONConcurrency Cost:1Model Size:0.5BQuant:BF16Ctx Length:32kPublished:Apr 16, 2026Architecture:Transformer Cold
SeongryongJung/qwen2.5-0.5b-ifeval-mixed-kd-alpha05 is a 0.5 billion parameter language model distilled from a Qwen2.5-1.5B-Instruct teacher. This model was specifically trained using knowledge distillation on the IFEvalSFTDataset, focusing on improving its performance on instruction following evaluations. It achieves an observed local IFEval accuracy of 0.4137577002, making it suitable for tasks requiring precise instruction adherence.
Loading preview...
Model Overview
SeongryongJung/qwen2.5-0.5b-ifeval-mixed-kd-alpha05 is a compact 0.5 billion parameter language model. It was created through a knowledge distillation process, where a smaller student model (Qwen2.5-0.5B-Instruct) learned from a larger teacher model (Qwen2.5-1.5B-Instruct).
Key Capabilities
- Instruction Following: The model's training specifically targeted instruction following, utilizing the
IFEvalSFTDatasetfor distillation. This makes it adept at interpreting and executing given instructions. - Knowledge Distillation: The distillation setup involved a
distill_alphaof 0.5 and adistill_temperatureof 2.0, with an effective loss mix of 50% Cross-Entropy and 50% Knowledge Distillation loss. This method transfers knowledge efficiently from a more capable teacher model. - Performance on IFEval: It demonstrates an observed local IFEval accuracy of
0.4137577002, indicating its proficiency in instruction evaluation tasks.
Good For
- Applications requiring a smaller, efficient model with a focus on instruction adherence.
- Scenarios where a balance between model size and instruction-following capability is crucial.
- Research into knowledge distillation techniques for improving specific task performance in smaller LLMs.