Model Overview

This model, opd_math500_S-Qwen2.5-3B-Instruct_T-Qwen2-7B-Instruct, is a 3.1 billion parameter language model fine-tuned from the base Qwen/Qwen2.5-3B-Instruct model. It leverages a unique training methodology known as GKD (On-Policy Distillation).

Key Training Methodology

The core differentiator of this model is its training procedure, which utilizes GKD (On-Policy Distillation). This method, detailed in the paper "On-Policy Distillation of Language Models: Learning from Self-Generated Mistakes" (ICLR 2024), focuses on improving model performance by learning from its own generated errors. The training was implemented using the TRL library.

Technical Specifications

Base Model: Qwen/Qwen2.5-3B-Instruct
Parameters: 3.1 Billion
Context Length: 32768 tokens
Training Frameworks: TRL (version 1.0.0.dev0), Transformers (version 5.3.0), Pytorch (version 2.6.0+cu124)

Use Cases

This model is particularly suited for applications where the GKD training paradigm, which emphasizes learning from self-generated mistakes, could offer advantages over traditional fine-tuning methods. Developers interested in exploring models trained with advanced distillation techniques may find this model valuable.

Overview

Model Overview

Key Training Methodology

Technical Specifications

Use Cases

Full Model Card (README)