rishabhrj11/distillspec-qwen6-rkl-unquant
TEXT GENERATIONConcurrency Cost:1Model Size:0.8BQuant:BF16Ctx Length:32kArchitecture:Transformer Warm

The rishabhrj11/distillspec-qwen6-rkl-unquant model is a 0.8 billion parameter language model fine-tuned from Qwen/Qwen3-0.6B. It was trained using the GKD (On-Policy Distillation of Language Models: Learning from Self-Generated Mistakes) method, which focuses on distillation from self-generated errors. This model is optimized for generating coherent and contextually relevant text, leveraging its 40960 token context length for complex prompts.

Loading preview...