SeongryongJung/qwen2.5-0.5b-ifeval-mixed-kd-alpha05

TEXT GENERATIONConcurrency Cost:1Model Size:0.5BQuant:BF16Ctx Length:32kPublished:Apr 16, 2026Architecture:Transformer Cold

SeongryongJung/qwen2.5-0.5b-ifeval-mixed-kd-alpha05 is a 0.5 billion parameter language model distilled from a Qwen2.5-1.5B-Instruct teacher. This model was specifically trained using knowledge distillation on the IFEvalSFTDataset, focusing on improving its performance on instruction following evaluations. It achieves an observed local IFEval accuracy of 0.4137577002, making it suitable for tasks requiring precise instruction adherence.

Loading preview...

Model Overview

SeongryongJung/qwen2.5-0.5b-ifeval-mixed-kd-alpha05 is a compact 0.5 billion parameter language model. It was created through a knowledge distillation process, where a smaller student model (Qwen2.5-0.5B-Instruct) learned from a larger teacher model (Qwen2.5-1.5B-Instruct).

Key Capabilities

  • Instruction Following: The model's training specifically targeted instruction following, utilizing the IFEvalSFTDataset for distillation. This makes it adept at interpreting and executing given instructions.
  • Knowledge Distillation: The distillation setup involved a distill_alpha of 0.5 and a distill_temperature of 2.0, with an effective loss mix of 50% Cross-Entropy and 50% Knowledge Distillation loss. This method transfers knowledge efficiently from a more capable teacher model.
  • Performance on IFEval: It demonstrates an observed local IFEval accuracy of 0.4137577002, indicating its proficiency in instruction evaluation tasks.

Good For

  • Applications requiring a smaller, efficient model with a focus on instruction adherence.
  • Scenarios where a balance between model size and instruction-following capability is crucial.
  • Research into knowledge distillation techniques for improving specific task performance in smaller LLMs.