Model Overview

This model, opd_gsm8k_S-Qwen2-1.5B-Instruct_T-Qwen2-7B-Instruct, is a 1.5 billion parameter language model derived from the Qwen2-1.5B-Instruct architecture. It has been fine-tuned using the TRL framework, specifically employing a method called On-Policy Distillation (GKD).

Key Capabilities

Enhanced Learning through Self-Correction: The model's training procedure, GKD, allows it to learn from its own generated mistakes, which can lead to improved reasoning and instruction-following capabilities.
Instruction-Tuned: As an instruction-tuned model, it is designed to follow user prompts and generate relevant responses.
Based on Qwen2 Architecture: Leverages the foundational strengths of the Qwen2-1.5B-Instruct model.

Training Details

The model's unique training involved GKD, a technique detailed in the paper "On-Policy Distillation of Language Models: Learning from Self-Generated Mistakes" (ICLR 2024). This method focuses on distilling knowledge by learning from self-generated errors, aiming to refine the model's output quality and reasoning.

Good For

Applications requiring a compact yet capable instruction-following model.
Use cases where improved reasoning through distillation is beneficial.
Developers interested in models trained with advanced distillation techniques like GKD.

Overview

Model Overview

Key Capabilities

Training Details

Good For

Full Model Card (README)