CL-From-Nothing/Qwen3-4B-SSD-RLVE-Eval20-N20-global-step-500
CL-From-Nothing/Qwen3-4B-SSD-RLVE-Eval20-N20-global-step-500 is a 4 billion parameter Qwen3-based causal language model developed by CL-From-Nothing. This model utilizes Simple Self-Distillation (SSD) on self-generated responses from the frozen base model, then fine-tuned on these samples. It is specifically optimized for tasks related to the RLVE-Eval20 dataset, making it suitable for applications requiring refined response generation based on self-distillation techniques.
Loading preview...
Model Overview
CL-From-Nothing/Qwen3-4B-SSD-RLVE-Eval20-N20-global-step-500 is a 4 billion parameter language model built upon the Qwen3 architecture. This model was developed by CL-From-Nothing and incorporates a unique training methodology known as Simple Self-Distillation (SSD). The SSD process involves generating 20 self-generated responses from the frozen base model, followed by supervised fine-tuning (SFT) on these collected samples.
Key Characteristics
- Architecture: Based on the Qwen3-4B model.
- Training Method: Employs Simple Self-Distillation (SSD) by fine-tuning on N=20 self-generated responses from the base model.
- Training Data: Fine-tuned on a specific Parquet SFT corpus (16k rows) from CL-From-Nothing/RLVE-Eval20-Qwen3-4B-SSD-N20-SFT-Train.
- Checkpoint: Represents
global_step_500from the VERL FSDP SFT checkpoint, indicating 500 optimizer steps and a 1-epoch schedule. - Companion Model: A smaller 1.7B parameter version is also available: CL-From-Nothing/Qwen3-1-7B-SSD-RLVE-Eval20-N20-global-step-500.
Use Cases
This model is particularly well-suited for applications that benefit from models trained with self-distillation techniques, especially those requiring refined response generation within the context of the RLVE-Eval20 evaluation framework. Its training methodology suggests potential for improved coherence and quality in generated text by learning from its own outputs.