burtenshaw/Qwen3-4B-GKD-Tulu

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:4BQuant:BF16Ctx Length:32kArchitecture:Transformer0.0K Warm

The burtenshaw/Qwen3-4B-GKD-Tulu model is a 4 billion parameter language model, fine-tuned from burtenshaw/Qwen3-4B-SFT-Codeforces using the GKD (On-Policy Distillation of Language Models) method. This model is designed to learn from self-generated mistakes, aiming to improve performance through an on-policy distillation approach. It is particularly suited for tasks where iterative refinement and learning from errors can enhance language generation capabilities.

Loading preview...

Model Overview

burtenshaw/Qwen3-4B-GKD-Tulu is a 4 billion parameter language model, building upon the burtenshaw/Qwen3-4B-SFT-Codeforces base model. Its key differentiator lies in its training methodology, which utilizes GKD (On-Policy Distillation of Language Models). This technique, detailed in the paper "On-Policy Distillation of Language Models: Learning from Self-Generated Mistakes" (ICLR 2024), enables the model to learn and refine its outputs by analyzing its own generated errors.

Key Capabilities & Training

  • Fine-tuned Base: Derived from burtenshaw/Qwen3-4B-SFT-Codeforces, suggesting a foundation in supervised fine-tuning, potentially for code-related tasks.
  • GKD Training: Employs an on-policy distillation approach where the model learns from its self-generated mistakes, aiming for improved performance and robustness.
  • Frameworks: Trained using TRL (Transformer Reinforcement Learning), Transformers, Pytorch, Datasets, and Tokenizers.

Potential Use Cases

This model is particularly interesting for applications requiring:

  • Refined Language Generation: Where the ability to learn from and correct its own outputs is beneficial.
  • Iterative Improvement: Scenarios where a model can benefit from a self-correction mechanism during its training or inference process.
  • Research into Distillation: As an example of on-policy distillation, it serves as a valuable resource for researchers exploring advanced training techniques for LLMs.