Adanato/qwen25_3b_qwen25_qwen3_rank_only-qwen25_qwen3_rank_only_cluster_0
Adanato/qwen25_3b_qwen25_qwen3_rank_only-qwen25_qwen3_rank_only_cluster_0 is a 3.1 billion parameter language model, fine-tuned from Qwen/Qwen2.5-3B. This model was specifically fine-tuned on the qwen25_qwen3_rank_only_cluster_0 dataset. It leverages a 32768 token context length and was trained with a learning rate of 1e-05 over 1.0 epochs. Its primary differentiation lies in its specialized fine-tuning for tasks related to ranking within the Qwen2.5 and Qwen3 model families.
Loading preview...
Model Overview
Adanato/qwen25_3b_qwen25_qwen3_rank_only-qwen25_qwen3_rank_only_cluster_0 is a specialized 3.1 billion parameter language model derived from the Qwen/Qwen2.5-3B base architecture. This model has undergone specific fine-tuning on the qwen25_qwen3_rank_only_cluster_0 dataset, indicating an optimization for tasks involving ranking or preference modeling within the Qwen 2.5 and Qwen 3 ecosystems.
Key Training Details
The model was trained using the following hyperparameters:
- Base Model: Qwen/Qwen2.5-3B
- Dataset:
qwen25_qwen3_rank_only_cluster_0 - Learning Rate: 1e-05
- Epochs: 1.0
- Batch Size: 4 (train), 8 (eval) with 8 gradient accumulation steps, resulting in a total effective batch size of 128.
- Optimizer: AdamW_Torch_Fused with cosine learning rate scheduler and 0.1 warmup ratio.
- Context Length: 32768 tokens
Potential Use Cases
Given its fine-tuning on a ranking-specific dataset, this model is likely best suited for applications requiring:
- Preference Modeling: Understanding and predicting user preferences or rankings.
- Comparative Analysis: Tasks where comparing and ordering different outputs or options is crucial.
- Specialized Evaluation: Potentially useful in evaluating or scoring outputs from other Qwen2.5 or Qwen3 models based on learned ranking criteria.