QpiEImitation/opd_gsm8k_S-Qwen2-1.5B-Instruct_T-Qwen2-7B-Instruct
QpiEImitation/opd_gsm8k_S-Qwen2-1.5B-Instruct_T-Qwen2-7B-Instruct is a 1.5 billion parameter instruction-tuned language model, fine-tuned from Qwen/Qwen2-1.5B-Instruct. This model was trained using On-Policy Distillation (GKD) to learn from self-generated mistakes, enhancing its performance. It is specifically optimized for tasks requiring improved reasoning through this distillation method, making it suitable for general instruction-following applications.
Loading preview...
Model Overview
This model, opd_gsm8k_S-Qwen2-1.5B-Instruct_T-Qwen2-7B-Instruct, is a 1.5 billion parameter language model derived from the Qwen2-1.5B-Instruct architecture. It has been fine-tuned using the TRL framework, specifically employing a method called On-Policy Distillation (GKD).
Key Capabilities
- Enhanced Learning through Self-Correction: The model's training procedure, GKD, allows it to learn from its own generated mistakes, which can lead to improved reasoning and instruction-following capabilities.
- Instruction-Tuned: As an instruction-tuned model, it is designed to follow user prompts and generate relevant responses.
- Based on Qwen2 Architecture: Leverages the foundational strengths of the Qwen2-1.5B-Instruct model.
Training Details
The model's unique training involved GKD, a technique detailed in the paper "On-Policy Distillation of Language Models: Learning from Self-Generated Mistakes" (ICLR 2024). This method focuses on distilling knowledge by learning from self-generated errors, aiming to refine the model's output quality and reasoning.
Good For
- Applications requiring a compact yet capable instruction-following model.
- Use cases where improved reasoning through distillation is beneficial.
- Developers interested in models trained with advanced distillation techniques like GKD.