Gueule-d-ange/qwen1.5b-sft-1k is a 1.5 billion parameter language model, fine-tuned from Qwen/Qwen2.5-1.5B. This model was trained with a learning rate of 2e-05 over 1 epoch using a total batch size of 128. Specific details regarding its primary differentiators, intended uses, and training data are currently unspecified.
Loading preview...
Model Overview
Gueule-d-ange/qwen1.5b-sft-1k is a fine-tuned variant of the Qwen/Qwen2.5-1.5B base model. This 1.5 billion parameter model was subjected to a single epoch of supervised fine-tuning.
Training Details
The training process utilized a learning rate of 2e-05 with an AdamW optimizer. Key hyperparameters included a total training batch size of 128 (achieved with a train_batch_size of 1 and gradient_accumulation_steps of 16) and mixed-precision training. The training was conducted on 8 devices, employing a cosine learning rate scheduler with a warmup ratio of 0.03.
Current Status
As of now, specific information regarding the dataset used for fine-tuning, the model's intended uses, limitations, and evaluation data is not publicly available. Users should exercise caution and conduct their own evaluations to determine suitability for specific applications.