QpiEImitation/gkd_gsm8k_S-Qwen2-1.5B-Instruct_T-Qwen2-7B-Instruct
QpiEImitation/gkd_gsm8k_S-Qwen2-1.5B-Instruct_T-Qwen2-7B-Instruct is a 1.5 billion parameter instruction-tuned language model, fine-tuned from Qwen/Qwen2-1.5B-Instruct. This model was trained using the GKD (On-Policy Distillation of Language Models) method, which focuses on learning from self-generated mistakes. It is designed to enhance performance through a distillation process, making it suitable for tasks requiring refined instruction following.
Loading preview...
Model Overview
This model, gkd_gsm8k_S-Qwen2-1.5B-Instruct_T-Qwen2-7B-Instruct, is a 1.5 billion parameter instruction-tuned variant of the Qwen2-1.5B-Instruct base model. It has been specifically fine-tuned using the GKD (On-Policy Distillation of Language Models) method, as detailed in the paper "On-Policy Distillation of Language Models: Learning from Self-Generated Mistakes" (ICLR 2024). This training approach allows the model to learn and improve by analyzing its own generated errors.
Key Characteristics
- Base Model: Fine-tuned from Qwen/Qwen2-1.5B-Instruct.
- Training Method: Utilizes GKD (Generative Knowledge Distillation) for enhanced learning from self-generated mistakes.
- Framework: Trained with Hugging Face's TRL library.
- Context Length: Supports a context length of 32768 tokens.
Potential Use Cases
- Instruction Following: Optimized for tasks where precise adherence to instructions is crucial.
- Research in Distillation: Useful for researchers exploring on-policy distillation techniques and their impact on model performance.
- Resource-Efficient Deployment: As a 1.5B parameter model, it offers a balance between capability and computational efficiency for various NLP tasks.