Model Overview

This model, gkd_gsm8k_S-Qwen2-1.5B-Instruct_T-Qwen2-7B-Instruct, is a 1.5 billion parameter instruction-tuned variant of the Qwen2-1.5B-Instruct base model. It has been specifically fine-tuned using the GKD (On-Policy Distillation of Language Models) method, as detailed in the paper "On-Policy Distillation of Language Models: Learning from Self-Generated Mistakes" (ICLR 2024). This training approach allows the model to learn and improve by analyzing its own generated errors.

Key Characteristics

Base Model: Fine-tuned from Qwen/Qwen2-1.5B-Instruct.
Training Method: Utilizes GKD (Generative Knowledge Distillation) for enhanced learning from self-generated mistakes.
Framework: Trained with Hugging Face's TRL library.
Context Length: Supports a context length of 32768 tokens.

Potential Use Cases

Instruction Following: Optimized for tasks where precise adherence to instructions is crucial.
Research in Distillation: Useful for researchers exploring on-policy distillation techniques and their impact on model performance.
Resource-Efficient Deployment: As a 1.5B parameter model, it offers a balance between capability and computational efficiency for various NLP tasks.

Overview

Model Overview

Key Characteristics

Potential Use Cases

Full Model Card (README)