Model Overview

This model, sft-count_loss-Qwen3-0.6B-mle0.5-ul0.5-tox0-e4, is a fine-tuned version of the Qwen3-0.6B base model by Qwen. It features approximately 0.8 billion parameters and was developed through a supervised fine-tuning (SFT) process. The training aimed to optimize for a specific loss function, achieving a final validation loss of 1.9505.

Training Details

The model was trained with a learning rate of 3e-05 over 4 epochs, utilizing a cosine learning rate scheduler with 5 warmup steps. Key hyperparameters included a train_batch_size of 4 and gradient_accumulation_steps of 2, resulting in an effective total batch size of 8. The optimizer used was adamw_torch.

Limitations

Specific details regarding the training dataset, intended uses, and limitations are not provided in the available documentation. Users should exercise caution and conduct further evaluation to determine its suitability for particular applications, as its unique capabilities beyond the base Qwen3-0.6B model are not explicitly defined.

Overview

Model Overview

Training Details

Limitations

Full Model Card (README)