sathiiiii/polyalign-qwen2.5-1.5b-en-sft

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:1.5BQuant:BF16Ctx Length:32kPublished:May 10, 2026License:otherArchitecture:Transformer Warm

The sathiiiii/polyalign-qwen2.5-1.5b-en-sft model is a 1.5 billion parameter language model, fine-tuned from Qwen/Qwen2.5-1.5B. It is specifically adapted using the polyalign_train dataset, indicating a focus on tasks related to that dataset. This model is designed for general language generation and understanding, leveraging the Qwen2.5 architecture.

Loading preview...

Model Overview

The sathiiiii/polyalign-qwen2.5-1.5b-en-sft model is a specialized variant of the Qwen/Qwen2.5-1.5B architecture, featuring 1.5 billion parameters. It has undergone supervised fine-tuning (SFT) using the polyalign_train dataset, which suggests an optimization for tasks relevant to this specific training data.

Training Details

The model was trained with a learning rate of 1e-05, a total batch size of 64 (achieved with train_batch_size: 2 and gradient_accumulation_steps: 4 across 8 devices), and a cosine learning rate scheduler with a 0.1 warmup ratio. Training was conducted for 1.0 epoch using the AdamW_TORCH_FUSED optimizer and Native AMP for mixed-precision training. During evaluation, it achieved a loss of 1.4072 on the evaluation set.

Intended Use

While specific intended uses and limitations are not detailed in the provided information, its fine-tuning on the polyalign_train dataset implies suitability for tasks aligned with the characteristics of that data. Developers should consider the base Qwen2.5-1.5B capabilities and the specific fine-tuning for potential applications.