sathiiiii/polyalign-qwen2.5-1.5b-en-sft
The sathiiiii/polyalign-qwen2.5-1.5b-en-sft model is a 1.5 billion parameter language model, fine-tuned from Qwen/Qwen2.5-1.5B. It is specifically adapted using the polyalign_train dataset, indicating a focus on tasks related to that dataset. This model is designed for general language generation and understanding, leveraging the Qwen2.5 architecture.
Loading preview...
Model Overview
The sathiiiii/polyalign-qwen2.5-1.5b-en-sft model is a specialized variant of the Qwen/Qwen2.5-1.5B architecture, featuring 1.5 billion parameters. It has undergone supervised fine-tuning (SFT) using the polyalign_train dataset, which suggests an optimization for tasks relevant to this specific training data.
Training Details
The model was trained with a learning rate of 1e-05, a total batch size of 64 (achieved with train_batch_size: 2 and gradient_accumulation_steps: 4 across 8 devices), and a cosine learning rate scheduler with a 0.1 warmup ratio. Training was conducted for 1.0 epoch using the AdamW_TORCH_FUSED optimizer and Native AMP for mixed-precision training. During evaluation, it achieved a loss of 1.4072 on the evaluation set.
Intended Use
While specific intended uses and limitations are not detailed in the provided information, its fine-tuning on the polyalign_train dataset implies suitability for tasks aligned with the characteristics of that data. Developers should consider the base Qwen2.5-1.5B capabilities and the specific fine-tuning for potential applications.