Name: QpiEImitation/gkd_gsm8k_S-Qwen2.5-3B-Instruct_T-Qwen2-7B-Instruct API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: QpiEImitation

Overview

This model, gkd_gsm8k_S-Qwen2.5-3B-Instruct_T-Qwen2-7B-Instruct, is a fine-tuned version of the Qwen2.5-3B-Instruct base model, developed by QpiEImitation. It leverages the Qwen2.5-3B-Instruct architecture, featuring 3.1 billion parameters and a context length of 32768 tokens. The fine-tuning process utilized the TRL library and implemented the GKD (On-Policy Distillation of Language Models) method.

Key Capabilities

Enhanced Reasoning: The GKD training method, detailed in the paper "On-Policy Distillation of Language Models: Learning from Self-Generated Mistakes," aims to improve the model's ability to learn from its own errors, potentially leading to more robust reasoning. This is particularly relevant for tasks like mathematical problem-solving, as indicated by the gsm8k in its name.
Instruction Following: As an instruction-tuned model, it is designed to understand and execute user prompts effectively.

Training Methodology

The model was trained using GKD, a distillation technique where a smaller student model learns from a larger teacher model's self-generated mistakes. This approach is designed to enhance the student model's performance by focusing on challenging examples.

Good For

Applications requiring a compact yet capable model for reasoning tasks.
Research and experimentation with on-policy distillation methods like GKD.
Tasks that benefit from improved instruction following and problem-solving, potentially including mathematical word problems (GSM8K).

Limitations

As a 3.1 billion parameter model, it may not match the performance of much larger models on highly complex or open-ended generative tasks.

Overview

Overview

Key Capabilities

Training Methodology

Good For

Limitations

Full Model Card (README)