sathiiiii/polyalign-qwen2.5-3b-en-sft

TEXT GENERATIONConcurrency Cost:1Model Size:3.1BQuant:BF16Ctx Length:32kPublished:Apr 20, 2026License:otherArchitecture:Transformer Cold

The sathiiiii/polyalign-qwen2.5-3b-en-sft model is a 3.1 billion parameter language model, fine-tuned from Qwen/Qwen2.5-3B on the polyalign_train dataset. This model is designed for general language tasks, leveraging the Qwen2.5 architecture. Its fine-tuning process aims to optimize performance on the specific characteristics of the polyalign_train dataset, making it suitable for applications requiring a compact yet capable language model.

Loading preview...

Model Overview

The sathiiiii/polyalign-qwen2.5-3b-en-sft model is a 3.1 billion parameter language model, derived from the Qwen/Qwen2.5-3B base model. It has undergone supervised fine-tuning (SFT) using the polyalign_train dataset. The training process involved a learning rate of 1e-05, a total batch size of 64, and a cosine learning rate scheduler with a 0.1 warmup ratio over 1 epoch.

Key Characteristics

  • Base Model: Qwen/Qwen2.5-3B, a robust foundation for various language tasks.
  • Parameter Count: 3.1 billion parameters, offering a balance between performance and computational efficiency.
  • Fine-tuning: Specifically fine-tuned on the polyalign_train dataset, indicating potential specialization for tasks related to this dataset's domain.
  • Context Length: Supports a context length of 32768 tokens, allowing for processing of substantial input sequences.

Potential Use Cases

Given its fine-tuning on the polyalign_train dataset, this model is likely best suited for:

  • Text generation within the domain covered by the polyalign_train dataset.
  • Language understanding tasks where the fine-tuning data provides relevant context.
  • Applications requiring a smaller, efficient LLM that benefits from specific domain adaptation.