Adanato/qwen25_3b_qwen25_qwen3_rank_only-qwen25_qwen3_rank_only_cluster_2 is a 3.1 billion parameter language model fine-tuned from Qwen/Qwen2.5-3B. This model is specifically fine-tuned on the 'qwen25_qwen3_rank_only_cluster_2' dataset, indicating a specialization in ranking tasks or specific data clusters. With a context length of 32768 tokens, it is designed for applications requiring processing of moderately long sequences, particularly within its fine-tuned domain.
Loading preview...
Model Overview
This model, Adanato/qwen25_3b_qwen25_qwen3_rank_only-qwen25_qwen3_rank_only_cluster_2, is a fine-tuned variant of the Qwen/Qwen2.5-3B base model. It features 3.1 billion parameters and supports a context length of 32768 tokens.
Key Characteristics
- Base Model: Qwen/Qwen2.5-3B, a causal language model developed by Qwen.
- Fine-tuning Dataset: Specifically trained on the
qwen25_qwen3_rank_only_cluster_2dataset, suggesting an optimization for tasks related to ranking or specific data clustering within the Qwen 2.5 and Qwen 3 ecosystems.
Training Details
The model was trained with a learning rate of 1e-05, a batch size of 4 (total 128 with gradient accumulation), and utilized the AdamW_Torch_Fused optimizer. Training was conducted for 1 epoch with a cosine learning rate scheduler and a warmup ratio of 0.1. The training environment included Transformers 4.57.1 and PyTorch 2.10.0+cu128.
Intended Use
While specific intended uses and limitations are not detailed in the provided information, its fine-tuning on a specialized ranking dataset implies potential suitability for applications where ranking or classification based on Qwen 2.5 and Qwen 3 data characteristics is crucial.