QpiEImitation/gkd_math500_S-Qwen2-0.5B-Instruct_T-Qwen2-7B-Instruct

TEXT GENERATIONConcurrency Cost:1Model Size:0.5BQuant:BF16Ctx Length:32kPublished:Apr 20, 2026Architecture:Transformer Cold

QpiEImitation/gkd_math500_S-Qwen2-0.5B-Instruct_T-Qwen2-7B-Instruct is a 0.5 billion parameter instruction-tuned language model, fine-tuned from Qwen/Qwen2-0.5B-Instruct. This model was trained using the GKD (On-Policy Distillation of Language Models) method, which focuses on learning from self-generated mistakes. It is designed for general text generation tasks, leveraging its unique training approach to enhance performance.

Loading preview...

Overview

This model, gkd_math500_S-Qwen2-0.5B-Instruct_T-Qwen2-7B-Instruct, is a 0.5 billion parameter instruction-tuned language model. It is a fine-tuned version of the base model Qwen/Qwen2-0.5B-Instruct and was developed by QpiEImitation.

Key Training Methodology

The model's distinctiveness stems from its training procedure, which utilizes GKD (On-Policy Distillation of Language Models). This method, detailed in the paper "On-Policy Distillation of Language Models: Learning from Self-Generated Mistakes" (ICLR 2024), focuses on improving the model by learning from its own generated errors. The training was implemented using the TRL (Transformers Reinforcement Learning) framework.

Capabilities

  • Instruction Following: Designed to respond to user instructions effectively due to its instruction-tuned nature.
  • Text Generation: Capable of generating coherent and contextually relevant text based on prompts.

Good For

  • Developers looking for a compact, instruction-tuned model.
  • Experimentation with models trained using advanced distillation techniques like GKD.
  • General natural language processing tasks where a 0.5B parameter model is suitable.