QpiEImitation/gkd_math500_S-Qwen2-1.5B-Instruct_T-Qwen2-7B-Instruct

TEXT GENERATIONConcurrency Cost:1Model Size:1.5BQuant:BF16Ctx Length:32kPublished:Apr 20, 2026Architecture:Transformer Cold

QpiEImitation/gkd_math500_S-Qwen2-1.5B-Instruct_T-Qwen2-7B-Instruct is a 1.5 billion parameter instruction-tuned language model fine-tuned from Qwen/Qwen2-1.5B-Instruct. It was trained using the GKD (On-Policy Distillation) method, which involves learning from self-generated mistakes. This model is optimized for improved performance through distillation, making it suitable for general instruction-following tasks.

Loading preview...

Model Overview

This model, gkd_math500_S-Qwen2-1.5B-Instruct_T-Qwen2-7B-Instruct, is a 1.5 billion parameter language model based on the Qwen2-1.5B-Instruct architecture. It has been fine-tuned using the GKD (On-Policy Distillation) method, a technique where the model learns from its own self-generated mistakes, as detailed in the paper "On-Policy Distillation of Language Models: Learning from Self-Generated Mistakes" (ICLR 2024).

Key Capabilities

  • Instruction Following: Designed to respond to user instructions effectively.
  • Distillation-Enhanced Performance: Benefits from the GKD training procedure, which aims to improve model capabilities through a unique distillation process.

Training Details

The model was trained using the TRL (Transformers Reinforcement Learning) library. The GKD method, which is central to its training, focuses on on-policy distillation. This approach allows the model to refine its understanding and generation by iteratively learning from its own outputs.

Good For

  • Developers looking for a Qwen2-1.5B-Instruct variant with enhanced instruction-following capabilities through distillation.
  • Experimenting with models trained using advanced distillation techniques like GKD.