QpiEImitation/gkd_math500_S-Qwen2-0.5B-Instruct_T-Qwen2-7B-Instruct
QpiEImitation/gkd_math500_S-Qwen2-0.5B-Instruct_T-Qwen2-7B-Instruct is a 0.5 billion parameter instruction-tuned language model, fine-tuned from Qwen/Qwen2-0.5B-Instruct. This model was trained using the GKD (On-Policy Distillation of Language Models) method, which focuses on learning from self-generated mistakes. It is designed for general text generation tasks, leveraging its unique training approach to enhance performance.
Loading preview...
Overview
This model, gkd_math500_S-Qwen2-0.5B-Instruct_T-Qwen2-7B-Instruct, is a 0.5 billion parameter instruction-tuned language model. It is a fine-tuned version of the base model Qwen/Qwen2-0.5B-Instruct and was developed by QpiEImitation.
Key Training Methodology
The model's distinctiveness stems from its training procedure, which utilizes GKD (On-Policy Distillation of Language Models). This method, detailed in the paper "On-Policy Distillation of Language Models: Learning from Self-Generated Mistakes" (ICLR 2024), focuses on improving the model by learning from its own generated errors. The training was implemented using the TRL (Transformers Reinforcement Learning) framework.
Capabilities
- Instruction Following: Designed to respond to user instructions effectively due to its instruction-tuned nature.
- Text Generation: Capable of generating coherent and contextually relevant text based on prompts.
Good For
- Developers looking for a compact, instruction-tuned model.
- Experimentation with models trained using advanced distillation techniques like GKD.
- General natural language processing tasks where a 0.5B parameter model is suitable.