QpiEImitation/opd_gsm8k_S-Qwen2.5-3B-Instruct_T-Qwen2-7B-Instruct

TEXT GENERATIONConcurrency Cost:1Model Size:3.1BQuant:BF16Ctx Length:32kPublished:Apr 20, 2026Architecture:Transformer Cold

QpiEImitation/opd_gsm8k_S-Qwen2.5-3B-Instruct_T-Qwen2-7B-Instruct is a 3.1 billion parameter instruction-tuned causal language model, fine-tuned from Qwen/Qwen2.5-3B-Instruct. It was trained using the GKD (On-Policy Distillation) method, which focuses on learning from self-generated mistakes. This model is optimized for improved performance through distillation, making it suitable for general instruction-following tasks.

Loading preview...

Overview

This model, opd_gsm8k_S-Qwen2.5-3B-Instruct_T-Qwen2-7B-Instruct, is a 3.1 billion parameter instruction-tuned language model. It is a fine-tuned version of the base model Qwen/Qwen2.5-3B-Instruct.

Key Capabilities

  • Instruction Following: The model has been specifically fine-tuned to follow instructions effectively, building upon the capabilities of its base Qwen2.5-3B-Instruct architecture.
  • GKD Training Method: It was trained using the GKD (On-Policy Distillation of Language Models) method, as detailed in the paper "On-Policy Distillation of Language Models: Learning from Self-Generated Mistakes". This approach allows the model to learn and improve by analyzing its own generated errors.
  • TRL Framework: The training process leveraged the TRL (Transformers Reinforcement Learning) library, indicating a focus on advanced fine-tuning techniques.

Good For

  • General Instruction-Following: Ideal for applications requiring a model to accurately interpret and respond to user instructions.
  • Research in Distillation: Useful for researchers exploring on-policy distillation methods and their impact on language model performance.
  • Efficient Deployment: As a 3.1B parameter model, it offers a balance between performance and computational efficiency, making it suitable for scenarios where larger models might be too resource-intensive.