Adanato/qwen25_3b_qwen25_qwen3_rank_only-qwen25_qwen3_rank_only_cluster_4 is a 3.1 billion parameter language model fine-tuned from Qwen/Qwen2.5-3B. This model has been specifically fine-tuned on the qwen25_qwen3_rank_only_cluster_4 dataset, suggesting an optimization for tasks related to ranking or specific data clusters. It is designed for applications requiring a compact yet specialized model with a 32768 token context length.
Loading preview...
Model Overview
This model, Adanato/qwen25_3b_qwen25_qwen3_rank_only-qwen25_qwen3_rank_only_cluster_4, is a fine-tuned variant of the Qwen/Qwen2.5-3B base model, developed by Qwen. It features approximately 3.1 billion parameters and supports a 32768 token context length, making it suitable for tasks requiring moderate context understanding.
Key Characteristics
- Base Model: Fine-tuned from Qwen/Qwen2.5-3B.
- Fine-tuning Dataset: Specifically trained on the
qwen25_qwen3_rank_only_cluster_4dataset, indicating a specialization in ranking-related tasks or performance within particular data clusters.
Training Details
The model was trained using the following hyperparameters:
- Learning Rate: 1e-05
- Optimizer: AdamW_Torch_Fused with betas=(0.9, 0.999) and epsilon=1e-08.
- Batch Size: A total training batch size of 128 (train_batch_size: 4, gradient_accumulation_steps: 8, num_devices: 4).
- Epochs: Trained for 1.0 epoch.
- Scheduler: Cosine learning rate scheduler with a 0.1 warmup ratio.
Potential Use Cases
Given its fine-tuning on a specific ranking dataset, this model is likely optimized for:
- Tasks involving ranking or preference prediction.
- Applications within the specific data domain of the
qwen25_qwen3_rank_only_cluster_4dataset.
Further details on specific intended uses and limitations are not provided in the original model card.