Model Overview

This model, gkd_math500_S-Qwen2-1.5B-Instruct_T-Qwen2-7B-Instruct, is a 1.5 billion parameter language model based on the Qwen2-1.5B-Instruct architecture. It has been fine-tuned using the GKD (On-Policy Distillation) method, a technique where the model learns from its own self-generated mistakes, as detailed in the paper "On-Policy Distillation of Language Models: Learning from Self-Generated Mistakes" (ICLR 2024).

Key Capabilities

Instruction Following: Designed to respond to user instructions effectively.
Distillation-Enhanced Performance: Benefits from the GKD training procedure, which aims to improve model capabilities through a unique distillation process.

Training Details

The model was trained using the TRL (Transformers Reinforcement Learning) library. The GKD method, which is central to its training, focuses on on-policy distillation. This approach allows the model to refine its understanding and generation by iteratively learning from its own outputs.

Good For

Developers looking for a Qwen2-1.5B-Instruct variant with enhanced instruction-following capabilities through distillation.
Experimenting with models trained using advanced distillation techniques like GKD.

Overview

Model Overview

Key Capabilities

Training Details

Good For

Full Model Card (README)