Overview

This model, gkd_math500_S-Qwen2.5-3B-Instruct_T-Qwen2-7B-Instruct, is a 3.09 billion parameter instruction-tuned language model. It is a fine-tuned variant of the Qwen/Qwen2.5-3B-Instruct base model, developed by Qwen. The fine-tuning process utilized the TRL library and incorporated a specific training methodology known as GKD.

Key Capabilities

Instruction Following: As an instruction-tuned model, it is designed to generate responses based on given prompts and instructions.
GKD Training: The model's unique characteristic is its training with GKD (On-Policy Distillation of Language Models), a method that enables the model to learn effectively from its own generated errors. This approach aims to improve the model's robustness and performance by iteratively refining its understanding.

Good For

General Text Generation: Suitable for a wide range of text generation tasks where instruction following is important.
Research into Distillation Methods: Provides a practical example of a model trained with the GKD distillation technique, which could be valuable for researchers exploring advanced training methodologies.

Training Details

The model was trained using TRL version 1.0.0.dev0, Transformers 5.3.0, Pytorch 2.6.0+cu124, Datasets 4.8.2, and Tokenizers 0.22.2. Further details on the training run can be visualized via Weights & Biases.

Overview

Overview

Key Capabilities

Good For

Training Details

Full Model Card (README)