QpiEImitation/opd_gsm8k_S-Qwen2.5-3B-Instruct_T-Qwen2-7B-Instruct
TEXT GENERATIONConcurrency Cost:1Model Size:3.1BQuant:BF16Ctx Length:32kPublished:Apr 20, 2026Architecture:Transformer Cold
QpiEImitation/opd_gsm8k_S-Qwen2.5-3B-Instruct_T-Qwen2-7B-Instruct is a 3.1 billion parameter instruction-tuned causal language model, fine-tuned from Qwen/Qwen2.5-3B-Instruct. It was trained using the GKD (On-Policy Distillation) method, which focuses on learning from self-generated mistakes. This model is optimized for improved performance through distillation, making it suitable for general instruction-following tasks.
Loading preview...
Overview
This model, opd_gsm8k_S-Qwen2.5-3B-Instruct_T-Qwen2-7B-Instruct, is a 3.1 billion parameter instruction-tuned language model. It is a fine-tuned version of the base model Qwen/Qwen2.5-3B-Instruct.
Key Capabilities
- Instruction Following: The model has been specifically fine-tuned to follow instructions effectively, building upon the capabilities of its base Qwen2.5-3B-Instruct architecture.
- GKD Training Method: It was trained using the GKD (On-Policy Distillation of Language Models) method, as detailed in the paper "On-Policy Distillation of Language Models: Learning from Self-Generated Mistakes". This approach allows the model to learn and improve by analyzing its own generated errors.
- TRL Framework: The training process leveraged the TRL (Transformers Reinforcement Learning) library, indicating a focus on advanced fine-tuning techniques.
Good For
- General Instruction-Following: Ideal for applications requiring a model to accurately interpret and respond to user instructions.
- Research in Distillation: Useful for researchers exploring on-policy distillation methods and their impact on language model performance.
- Efficient Deployment: As a 3.1B parameter model, it offers a balance between performance and computational efficiency, making it suitable for scenarios where larger models might be too resource-intensive.