QpiEImitation/gkd_gsm8k_S-Qwen2-0.5B-Instruct_T-Qwen2-7B-Instruct

TEXT GENERATIONConcurrency Cost:1Model Size:0.5BQuant:BF16Ctx Length:32kPublished:Apr 20, 2026Architecture:Transformer Cold

The QpiEImitation/gkd_gsm8k_S-Qwen2-0.5B-Instruct_T-Qwen2-7B-Instruct model is a 0.5 billion parameter instruction-tuned language model based on the Qwen2 architecture. It has been fine-tuned using the GKD (On-Policy Distillation of Language Models) method, which involves learning from self-generated mistakes. This model is specifically optimized for tasks where distillation from a larger teacher model (Qwen2-7B-Instruct) is beneficial, aiming for improved performance in a smaller footprint. It supports a context length of 32768 tokens.

Loading preview...

Model Overview

This model, gkd_gsm8k_S-Qwen2-0.5B-Instruct_T-Qwen2-7B-Instruct, is a 0.5 billion parameter instruction-tuned variant of the Qwen2-0.5B-Instruct architecture. It has been fine-tuned using the GKD (On-Policy Distillation of Language Models) method, as detailed in the paper "On-Policy Distillation of Language Models: Learning from Self-Generated Mistakes" (ICLR 2024). This approach involves distilling knowledge from a larger teacher model, specifically Qwen2-7B-Instruct, by learning from self-generated errors.

Key Characteristics

  • Base Model: Qwen/Qwen2-0.5B-Instruct.
  • Training Method: Utilizes GKD for on-policy distillation, a technique designed to improve model performance by learning from its own generated mistakes.
  • Framework: Trained with the TRL (Transformers Reinforcement Learning) library.
  • Context Length: Supports a substantial context window of 32768 tokens.

Use Cases

This model is suitable for applications requiring a compact yet capable instruction-following language model, particularly where the benefits of distillation from a larger model are desired. Its training methodology suggests potential for robust performance in tasks that benefit from iterative self-correction and refinement, making it a candidate for scenarios where efficiency and distilled intelligence are key.