huseyinatahaninan/appworld_distillation_sft_v2-SFT-Qwen3-14B
The huseyinatahaninan/appworld_distillation_sft_v2-SFT-Qwen3-14B is a 14 billion parameter language model, fine-tuned from the Qwen3-14B architecture. This model has been specifically fine-tuned on the appworld_distillation_sft_v2 dataset, indicating a specialization in tasks related to that dataset's domain. It demonstrates a validation loss of 0.6408, suggesting its performance within the scope of its training data. This model is suitable for applications requiring capabilities aligned with the appworld_distillation_sft_v2 dataset.
Loading preview...
Model Overview
This model, huseyinatahaninan/appworld_distillation_sft_v2-SFT-Qwen3-14B, is a 14 billion parameter language model built upon the Qwen3-14B architecture. It has undergone supervised fine-tuning (SFT) using the appworld_distillation_sft_v2 dataset.
Key Characteristics
- Base Model: Qwen3-14B, a large language model developed by Qwen.
- Fine-tuning Dataset: Specifically trained on the
appworld_distillation_sft_v2dataset, implying a focus on tasks or data distributions present within this dataset. - Performance: Achieved a final validation loss of 0.6408 during training, indicating its learned performance on the evaluation set.
Training Details
The model was trained for 25 epochs using a learning rate of 5e-06, a total batch size of 32, and the AdamW optimizer. The training utilized 8 GPUs with a cosine learning rate scheduler and a warmup ratio of 0.1.
Intended Use Cases
Given its fine-tuning on the appworld_distillation_sft_v2 dataset, this model is best suited for applications and tasks that align with the nature and content of that specific dataset. Users should evaluate its performance on their particular use case, especially if it falls within the domain of the training data.