TarhanE/sft-count_loss-Qwen3-0.6B-mle0.5-ul0.5-tox0-e4

TEXT GENERATIONConcurrency Cost:1Model Size:0.8BQuant:BF16Ctx Length:32kPublished:Jun 9, 2025License:apache-2.0Architecture:Transformer Open Weights Cold

The TarhanE/sft-count_loss-Qwen3-0.6B-mle0.5-ul0.5-tox0-e4 model is a 0.8 billion parameter language model, fine-tuned from the Qwen/Qwen3-0.6B architecture. This model was trained on an unspecified dataset, achieving a validation loss of 1.9505. It is a specialized fine-tuned variant, though its specific primary differentiator and intended use cases are not detailed in the available information.

Loading preview...

Model Overview

This model, sft-count_loss-Qwen3-0.6B-mle0.5-ul0.5-tox0-e4, is a fine-tuned version of the Qwen3-0.6B base model by Qwen. It features approximately 0.8 billion parameters and was developed through a supervised fine-tuning (SFT) process. The training aimed to optimize for a specific loss function, achieving a final validation loss of 1.9505.

Training Details

The model was trained with a learning rate of 3e-05 over 4 epochs, utilizing a cosine learning rate scheduler with 5 warmup steps. Key hyperparameters included a train_batch_size of 4 and gradient_accumulation_steps of 2, resulting in an effective total batch size of 8. The optimizer used was adamw_torch.

Limitations

Specific details regarding the training dataset, intended uses, and limitations are not provided in the available documentation. Users should exercise caution and conduct further evaluation to determine its suitability for particular applications, as its unique capabilities beyond the base Qwen3-0.6B model are not explicitly defined.