QpiEImitation/opd_gsm8k_S-Qwen2-1.5B-Instruct_T-Qwen2-7B-Instruct

TEXT GENERATIONConcurrency Cost:1Model Size:1.5BQuant:BF16Ctx Length:32kPublished:Apr 20, 2026Architecture:Transformer Cold

QpiEImitation/opd_gsm8k_S-Qwen2-1.5B-Instruct_T-Qwen2-7B-Instruct is a 1.5 billion parameter instruction-tuned language model, fine-tuned from Qwen/Qwen2-1.5B-Instruct. This model was trained using On-Policy Distillation (GKD) to learn from self-generated mistakes, enhancing its performance. It is specifically optimized for tasks requiring improved reasoning through this distillation method, making it suitable for general instruction-following applications.

Loading preview...

Model Overview

This model, opd_gsm8k_S-Qwen2-1.5B-Instruct_T-Qwen2-7B-Instruct, is a 1.5 billion parameter language model derived from the Qwen2-1.5B-Instruct architecture. It has been fine-tuned using the TRL framework, specifically employing a method called On-Policy Distillation (GKD).

Key Capabilities

  • Enhanced Learning through Self-Correction: The model's training procedure, GKD, allows it to learn from its own generated mistakes, which can lead to improved reasoning and instruction-following capabilities.
  • Instruction-Tuned: As an instruction-tuned model, it is designed to follow user prompts and generate relevant responses.
  • Based on Qwen2 Architecture: Leverages the foundational strengths of the Qwen2-1.5B-Instruct model.

Training Details

The model's unique training involved GKD, a technique detailed in the paper "On-Policy Distillation of Language Models: Learning from Self-Generated Mistakes" (ICLR 2024). This method focuses on distilling knowledge by learning from self-generated errors, aiming to refine the model's output quality and reasoning.

Good For

  • Applications requiring a compact yet capable instruction-following model.
  • Use cases where improved reasoning through distillation is beneficial.
  • Developers interested in models trained with advanced distillation techniques like GKD.