Name: sathiiiii/polyalign-qwen2.5-3b-en-dist-sft API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: sathiiiii

Model Overview

The sathiiiii/polyalign-qwen2.5-3b-en-dist-sft model is a fine-tuned variant of the Qwen/Qwen2.5-3B base model, developed by sathiiiii. It features approximately 3.1 billion parameters and supports a context length of 32,768 tokens. The model was specifically fine-tuned on the polyalign_dist_sft_train dataset.

Training Details

The fine-tuning process utilized a learning rate of 1e-05 with a batch size of 4 across 8 devices, accumulating gradients over 2 steps for an effective total batch size of 64. The optimizer used was ADAMW_TORCH_FUSED with a cosine learning rate scheduler and a 0.1 warmup ratio over 1.0 epoch. During training, the validation loss reached 1.1548 at 3000 steps.

Key Characteristics

Base Model: Qwen/Qwen2.5-3B
Parameter Count: 3.1 Billion
Context Length: 32,768 tokens
Fine-tuning Dataset: polyalign_dist_sft_train

Intended Use Cases

This model is suitable for general natural language processing tasks where a smaller, efficient language model is preferred. Its fine-tuning on the polyalign_dist_sft_train dataset suggests potential applications in areas related to the dataset's characteristics, though specific details on the dataset's content are not provided. Developers looking for a compact Qwen2.5-based model for inference or further specialization may find this model useful.

Overview

Model Overview

Training Details

Key Characteristics

Intended Use Cases

Full Model Card (README)