QpiEImitation/opd_math500_S-Qwen2.5-3B-Instruct_T-Qwen2-7B-Instruct
QpiEImitation/opd_math500_S-Qwen2.5-3B-Instruct_T-Qwen2-7B-Instruct is a 3.1 billion parameter instruction-tuned causal language model, fine-tuned from Qwen/Qwen2.5-3B-Instruct. This model was trained using the GKD (On-Policy Distillation) method, which involves learning from self-generated mistakes. It is optimized for tasks benefiting from this specific distillation approach, offering a distinct training methodology compared to standard fine-tuning.
Loading preview...
Model Overview
This model, opd_math500_S-Qwen2.5-3B-Instruct_T-Qwen2-7B-Instruct, is a 3.1 billion parameter language model fine-tuned from the base Qwen/Qwen2.5-3B-Instruct model. It leverages a unique training methodology known as GKD (On-Policy Distillation).
Key Training Methodology
The core differentiator of this model is its training procedure, which utilizes GKD (On-Policy Distillation). This method, detailed in the paper "On-Policy Distillation of Language Models: Learning from Self-Generated Mistakes" (ICLR 2024), focuses on improving model performance by learning from its own generated errors. The training was implemented using the TRL library.
Technical Specifications
- Base Model: Qwen/Qwen2.5-3B-Instruct
- Parameters: 3.1 Billion
- Context Length: 32768 tokens
- Training Frameworks: TRL (version 1.0.0.dev0), Transformers (version 5.3.0), Pytorch (version 2.6.0+cu124)
Use Cases
This model is particularly suited for applications where the GKD training paradigm, which emphasizes learning from self-generated mistakes, could offer advantages over traditional fine-tuning methods. Developers interested in exploring models trained with advanced distillation techniques may find this model valuable.