QpiEImitation/gkd_math500_S-Qwen2-1.5B-Instruct_T-Qwen2-7B-Instruct
QpiEImitation/gkd_math500_S-Qwen2-1.5B-Instruct_T-Qwen2-7B-Instruct is a 1.5 billion parameter instruction-tuned language model fine-tuned from Qwen/Qwen2-1.5B-Instruct. It was trained using the GKD (On-Policy Distillation) method, which involves learning from self-generated mistakes. This model is optimized for improved performance through distillation, making it suitable for general instruction-following tasks.
Loading preview...
Model Overview
This model, gkd_math500_S-Qwen2-1.5B-Instruct_T-Qwen2-7B-Instruct, is a 1.5 billion parameter language model based on the Qwen2-1.5B-Instruct architecture. It has been fine-tuned using the GKD (On-Policy Distillation) method, a technique where the model learns from its own self-generated mistakes, as detailed in the paper "On-Policy Distillation of Language Models: Learning from Self-Generated Mistakes" (ICLR 2024).
Key Capabilities
- Instruction Following: Designed to respond to user instructions effectively.
- Distillation-Enhanced Performance: Benefits from the GKD training procedure, which aims to improve model capabilities through a unique distillation process.
Training Details
The model was trained using the TRL (Transformers Reinforcement Learning) library. The GKD method, which is central to its training, focuses on on-policy distillation. This approach allows the model to refine its understanding and generation by iteratively learning from its own outputs.
Good For
- Developers looking for a Qwen2-1.5B-Instruct variant with enhanced instruction-following capabilities through distillation.
- Experimenting with models trained using advanced distillation techniques like GKD.