QpiEImitation/opd_math500_S-Qwen2.5-3B-Instruct_T-Qwen2-7B-Instruct

TEXT GENERATIONConcurrency Cost:1Model Size:3.1BQuant:BF16Ctx Length:32kPublished:Apr 20, 2026Architecture:Transformer Cold

QpiEImitation/opd_math500_S-Qwen2.5-3B-Instruct_T-Qwen2-7B-Instruct is a 3.1 billion parameter instruction-tuned causal language model, fine-tuned from Qwen/Qwen2.5-3B-Instruct. This model was trained using the GKD (On-Policy Distillation) method, which involves learning from self-generated mistakes. It is optimized for tasks benefiting from this specific distillation approach, offering a distinct training methodology compared to standard fine-tuning.

Loading preview...

Model Overview

This model, opd_math500_S-Qwen2.5-3B-Instruct_T-Qwen2-7B-Instruct, is a 3.1 billion parameter language model fine-tuned from the base Qwen/Qwen2.5-3B-Instruct model. It leverages a unique training methodology known as GKD (On-Policy Distillation).

Key Training Methodology

The core differentiator of this model is its training procedure, which utilizes GKD (On-Policy Distillation). This method, detailed in the paper "On-Policy Distillation of Language Models: Learning from Self-Generated Mistakes" (ICLR 2024), focuses on improving model performance by learning from its own generated errors. The training was implemented using the TRL library.

Technical Specifications

  • Base Model: Qwen/Qwen2.5-3B-Instruct
  • Parameters: 3.1 Billion
  • Context Length: 32768 tokens
  • Training Frameworks: TRL (version 1.0.0.dev0), Transformers (version 5.3.0), Pytorch (version 2.6.0+cu124)

Use Cases

This model is particularly suited for applications where the GKD training paradigm, which emphasizes learning from self-generated mistakes, could offer advantages over traditional fine-tuning methods. Developers interested in exploring models trained with advanced distillation techniques may find this model valuable.