QpiEImitation/gkd_gsm8k_S-Qwen2-1.5B-Instruct_T-Qwen2-7B-Instruct

TEXT GENERATIONConcurrency Cost:1Model Size:1.5BQuant:BF16Ctx Length:32kPublished:Apr 20, 2026Architecture:Transformer Cold

QpiEImitation/gkd_gsm8k_S-Qwen2-1.5B-Instruct_T-Qwen2-7B-Instruct is a 1.5 billion parameter instruction-tuned language model, fine-tuned from Qwen/Qwen2-1.5B-Instruct. This model was trained using the GKD (On-Policy Distillation of Language Models) method, which focuses on learning from self-generated mistakes. It is designed to enhance performance through a distillation process, making it suitable for tasks requiring refined instruction following.

Loading preview...

Model Overview

This model, gkd_gsm8k_S-Qwen2-1.5B-Instruct_T-Qwen2-7B-Instruct, is a 1.5 billion parameter instruction-tuned variant of the Qwen2-1.5B-Instruct base model. It has been specifically fine-tuned using the GKD (On-Policy Distillation of Language Models) method, as detailed in the paper "On-Policy Distillation of Language Models: Learning from Self-Generated Mistakes" (ICLR 2024). This training approach allows the model to learn and improve by analyzing its own generated errors.

Key Characteristics

  • Base Model: Fine-tuned from Qwen/Qwen2-1.5B-Instruct.
  • Training Method: Utilizes GKD (Generative Knowledge Distillation) for enhanced learning from self-generated mistakes.
  • Framework: Trained with Hugging Face's TRL library.
  • Context Length: Supports a context length of 32768 tokens.

Potential Use Cases

  • Instruction Following: Optimized for tasks where precise adherence to instructions is crucial.
  • Research in Distillation: Useful for researchers exploring on-policy distillation techniques and their impact on model performance.
  • Resource-Efficient Deployment: As a 1.5B parameter model, it offers a balance between capability and computational efficiency for various NLP tasks.