The huseyinatahaninan/appworld_distillation_sft_v2-SFT-Qwen3-4B-Instruct-2507 model is a 4 billion parameter instruction-tuned language model, fine-tuned from Qwen/Qwen3-4B-Instruct-2507 on the appworld_distillation_sft_v2 dataset. It features a 40960 token context length and was trained with a learning rate of 5e-06 over 30 epochs. This model is specifically adapted through supervised fine-tuning for tasks related to the appworld_distillation_sft_v2 domain, achieving a final validation loss of 0.7486.
Loading preview...
Model Overview
This model, huseyinatahaninan/appworld_distillation_sft_v2-SFT-Qwen3-4B-Instruct-2507, is a 4 billion parameter instruction-tuned language model. It is a supervised fine-tuned (SFT) version of the base model Qwen/Qwen3-4B-Instruct-2507, specifically adapted using the appworld_distillation_sft_v2 dataset.
Key Characteristics
- Base Model: Qwen3-4B-Instruct-2507
- Parameter Count: 4 billion
- Context Length: 40960 tokens
- Fine-tuning Dataset:
appworld_distillation_sft_v2 - Training Epochs: 30
- Final Validation Loss: 0.7486
Training Details
The model was trained using a learning rate of 5e-06, a train_batch_size of 4, and an eval_batch_size of 1 across 8 GPUs, resulting in a total_train_batch_size of 32. The optimizer used was adamw_torch with a cosine learning rate scheduler and a warmup ratio of 0.1. The training process involved 60 steps over 30 epochs, showing a consistent reduction in training loss and a final validation loss of 0.7486.
Good for
- Applications requiring a compact, instruction-tuned model based on the Qwen3-4B architecture.
- Tasks that benefit from specific knowledge or patterns learned from the
appworld_distillation_sft_v2dataset. - Scenarios where a balance between model size and performance on fine-tuned tasks is critical.