QpiEImitation/gkd_math500_S-Qwen2.5-3B-Instruct_T-Qwen2-7B-Instruct
QpiEImitation/gkd_math500_S-Qwen2.5-3B-Instruct_T-Qwen2-7B-Instruct is a 3.09 billion parameter instruction-tuned causal language model, fine-tuned from Qwen/Qwen2.5-3B-Instruct. This model was trained using the GKD (On-Policy Distillation of Language Models) method, which focuses on learning from self-generated mistakes. It is designed for general text generation tasks, leveraging its specialized training procedure to enhance performance.
Loading preview...
Overview
This model, gkd_math500_S-Qwen2.5-3B-Instruct_T-Qwen2-7B-Instruct, is a 3.09 billion parameter instruction-tuned language model. It is a fine-tuned variant of the Qwen/Qwen2.5-3B-Instruct base model, developed by Qwen. The fine-tuning process utilized the TRL library and incorporated a specific training methodology known as GKD.
Key Capabilities
- Instruction Following: As an instruction-tuned model, it is designed to generate responses based on given prompts and instructions.
- GKD Training: The model's unique characteristic is its training with GKD (On-Policy Distillation of Language Models), a method that enables the model to learn effectively from its own generated errors. This approach aims to improve the model's robustness and performance by iteratively refining its understanding.
Good For
- General Text Generation: Suitable for a wide range of text generation tasks where instruction following is important.
- Research into Distillation Methods: Provides a practical example of a model trained with the GKD distillation technique, which could be valuable for researchers exploring advanced training methodologies.
Training Details
The model was trained using TRL version 1.0.0.dev0, Transformers 5.3.0, Pytorch 2.6.0+cu124, Datasets 4.8.2, and Tokenizers 0.22.2. Further details on the training run can be visualized via Weights & Biases.