sathiiiii/polyalign-qwen2.5-3b-en-dist-sft
The sathiiiii/polyalign-qwen2.5-3b-en-dist-sft model is a 3.1 billion parameter causal language model, fine-tuned from Qwen/Qwen2.5-3B on the polyalign_dist_sft_train dataset. This model is designed for general language tasks, leveraging its Qwen2.5 architecture and a 32K context length. It is suitable for applications requiring a compact yet capable language model.
Loading preview...
Model Overview
The sathiiiii/polyalign-qwen2.5-3b-en-dist-sft model is a fine-tuned variant of the Qwen/Qwen2.5-3B base model, developed by sathiiiii. It features approximately 3.1 billion parameters and supports a context length of 32,768 tokens. The model was specifically fine-tuned on the polyalign_dist_sft_train dataset.
Training Details
The fine-tuning process utilized a learning rate of 1e-05 with a batch size of 4 across 8 devices, accumulating gradients over 2 steps for an effective total batch size of 64. The optimizer used was ADAMW_TORCH_FUSED with a cosine learning rate scheduler and a 0.1 warmup ratio over 1.0 epoch. During training, the validation loss reached 1.1548 at 3000 steps.
Key Characteristics
- Base Model: Qwen/Qwen2.5-3B
- Parameter Count: 3.1 Billion
- Context Length: 32,768 tokens
- Fine-tuning Dataset:
polyalign_dist_sft_train
Intended Use Cases
This model is suitable for general natural language processing tasks where a smaller, efficient language model is preferred. Its fine-tuning on the polyalign_dist_sft_train dataset suggests potential applications in areas related to the dataset's characteristics, though specific details on the dataset's content are not provided. Developers looking for a compact Qwen2.5-based model for inference or further specialization may find this model useful.