QpiEImitation/opd_gsm8k_S-Qwen2-0.5B-Instruct_T-Qwen2-7B-Instruct
QpiEImitation/opd_gsm8k_S-Qwen2-0.5B-Instruct_T-Qwen2-7B-Instruct is a 0.5 billion parameter instruction-tuned language model, fine-tuned from Qwen2-0.5B-Instruct. This model was trained using GKD (On-Policy Distillation of Language Models) to learn from self-generated mistakes, enhancing its performance. With a context length of 32768 tokens, it is optimized for tasks requiring improved reasoning through distillation techniques.
Loading preview...
Model Overview
This model, opd_gsm8k_S-Qwen2-0.5B-Instruct_T-Qwen2-7B-Instruct, is a 0.5 billion parameter instruction-tuned language model derived from the Qwen2-0.5B-Instruct architecture. It has been specifically fine-tuned using the GKD (On-Policy Distillation of Language Models) method, which focuses on learning from self-generated mistakes to refine model capabilities.
Key Characteristics
- Base Model: Fine-tuned from Qwen/Qwen2-0.5B-Instruct.
- Training Method: Utilizes GKD, a distillation technique designed to improve learning by analyzing and correcting self-generated errors, as detailed in the paper "On-Policy Distillation of Language Models: Learning from Self-Generated Mistakes" (ICLR 2024).
- Framework: Trained with TRL (Transformers Reinforcement Learning) version 1.0.0.dev0.
- Context Length: Supports a substantial context window of 32768 tokens.
Use Cases
This model is particularly suitable for applications where enhanced reasoning and robustness through distillation are beneficial. Its training methodology suggests improved performance in tasks where models can learn effectively from their own outputs and refine their understanding.