Model Overview

This model, gkd_gsm8k_S-Qwen2-0.5B-Instruct_T-Qwen2-7B-Instruct, is a 0.5 billion parameter instruction-tuned variant of the Qwen2-0.5B-Instruct architecture. It has been fine-tuned using the GKD (On-Policy Distillation of Language Models) method, as detailed in the paper "On-Policy Distillation of Language Models: Learning from Self-Generated Mistakes" (ICLR 2024). This approach involves distilling knowledge from a larger teacher model, specifically Qwen2-7B-Instruct, by learning from self-generated errors.

Key Characteristics

Base Model: Qwen/Qwen2-0.5B-Instruct.
Training Method: Utilizes GKD for on-policy distillation, a technique designed to improve model performance by learning from its own generated mistakes.
Framework: Trained with the TRL (Transformers Reinforcement Learning) library.
Context Length: Supports a substantial context window of 32768 tokens.

Use Cases

This model is suitable for applications requiring a compact yet capable instruction-following language model, particularly where the benefits of distillation from a larger model are desired. Its training methodology suggests potential for robust performance in tasks that benefit from iterative self-correction and refinement, making it a candidate for scenarios where efficiency and distilled intelligence are key.

Overview

Model Overview

Key Characteristics

Use Cases

Full Model Card (README)