Model Overview

This model, opd_gsm8k_S-Qwen2-0.5B-Instruct_T-Qwen2-7B-Instruct, is a 0.5 billion parameter instruction-tuned language model derived from the Qwen2-0.5B-Instruct architecture. It has been specifically fine-tuned using the GKD (On-Policy Distillation of Language Models) method, which focuses on learning from self-generated mistakes to refine model capabilities.

Key Characteristics

Base Model: Fine-tuned from Qwen/Qwen2-0.5B-Instruct.
Training Method: Utilizes GKD, a distillation technique designed to improve learning by analyzing and correcting self-generated errors, as detailed in the paper "On-Policy Distillation of Language Models: Learning from Self-Generated Mistakes" (ICLR 2024).
Framework: Trained with TRL (Transformers Reinforcement Learning) version 1.0.0.dev0.
Context Length: Supports a substantial context window of 32768 tokens.

Use Cases

This model is particularly suitable for applications where enhanced reasoning and robustness through distillation are beneficial. Its training methodology suggests improved performance in tasks where models can learn effectively from their own outputs and refine their understanding.

Overview

Model Overview

Key Characteristics

Use Cases

Full Model Card (README)