yulya-11/qwen3-finetuned
TEXT GENERATIONConcurrency Cost:1Model Size:0.8BQuant:BF16Ctx Length:32kPublished:Apr 17, 2026License:apache-2.0Architecture:Transformer0.0K Open Weights Cold

The yulya-11/qwen3-finetuned model is a 0.8 billion parameter language model, fine-tuned from the Qwen/Qwen3-0.6B architecture. This model was trained with a learning rate of 2e-05 over 3 epochs, achieving a final validation loss of 2.0435. While the specific fine-tuning dataset and intended uses are not detailed, its training process suggests adaptation for general language tasks.

Loading preview...

Model Overview

The yulya-11/qwen3-finetuned model is a fine-tuned variant of the Qwen3-0.6B architecture, featuring approximately 0.8 billion parameters. It was developed by yulya-11 and trained using specific hyperparameters to adapt its capabilities from the base Qwen model.

Training Details

The model underwent 3 epochs of training with a learning rate of 2e-05, a train_batch_size of 2, and gradient_accumulation_steps of 8, resulting in an effective total_train_batch_size of 16. The optimizer used was ADAMW_TORCH_FUSED. During training, the validation loss progressively decreased, reaching 2.0435 by the final epoch. The training utilized Transformers 5.0.0, Pytorch 2.10.0+cu128, Datasets 4.0.0, and Tokenizers 0.22.2.

Current Status and Limitations

As of the provided information, the specific dataset used for fine-tuning and the model's intended applications are not detailed. Users should be aware that without further information on the fine-tuning data, its performance on specific tasks or its unique differentiators compared to other models remain to be fully evaluated.