Name: burtenshaw/Qwen3-4B-GKD-Tulu API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: burtenshaw

Model Overview

burtenshaw/Qwen3-4B-GKD-Tulu is a 4 billion parameter language model, building upon the burtenshaw/Qwen3-4B-SFT-Codeforces base model. Its key differentiator lies in its training methodology, which utilizes GKD (On-Policy Distillation of Language Models). This technique, detailed in the paper "On-Policy Distillation of Language Models: Learning from Self-Generated Mistakes" (ICLR 2024), enables the model to learn and refine its outputs by analyzing its own generated errors.

Key Capabilities & Training

Fine-tuned Base: Derived from burtenshaw/Qwen3-4B-SFT-Codeforces, suggesting a foundation in supervised fine-tuning, potentially for code-related tasks.
GKD Training: Employs an on-policy distillation approach where the model learns from its self-generated mistakes, aiming for improved performance and robustness.
Frameworks: Trained using TRL (Transformer Reinforcement Learning), Transformers, Pytorch, Datasets, and Tokenizers.

Potential Use Cases

This model is particularly interesting for applications requiring:

Refined Language Generation: Where the ability to learn from and correct its own outputs is beneficial.
Iterative Improvement: Scenarios where a model can benefit from a self-correction mechanism during its training or inference process.
Research into Distillation: As an example of on-policy distillation, it serves as a valuable resource for researchers exploring advanced training techniques for LLMs.

Overview

Model Overview

Key Capabilities & Training

Potential Use Cases

Full Model Card (README)