Name: QpiEImitation/gkd_math500_S-Qwen2-0.5B-Instruct_T-Qwen2-7B-Instruct API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: QpiEImitation

Overview

This model, gkd_math500_S-Qwen2-0.5B-Instruct_T-Qwen2-7B-Instruct, is a 0.5 billion parameter instruction-tuned language model. It is a fine-tuned version of the base model Qwen/Qwen2-0.5B-Instruct and was developed by QpiEImitation.

Key Training Methodology

The model's distinctiveness stems from its training procedure, which utilizes GKD (On-Policy Distillation of Language Models). This method, detailed in the paper "On-Policy Distillation of Language Models: Learning from Self-Generated Mistakes" (ICLR 2024), focuses on improving the model by learning from its own generated errors. The training was implemented using the TRL (Transformers Reinforcement Learning) framework.

Capabilities

Instruction Following: Designed to respond to user instructions effectively due to its instruction-tuned nature.
Text Generation: Capable of generating coherent and contextually relevant text based on prompts.

Good For

Developers looking for a compact, instruction-tuned model.
Experimentation with models trained using advanced distillation techniques like GKD.
General natural language processing tasks where a 0.5B parameter model is suitable.

Overview

Overview

Key Training Methodology

Capabilities

Good For

Full Model Card (README)