Adanato/qwen25_3b_instruct_qwen25_qwen3_rank_only-qwen25_qwen3_rank_only_cluster_3
Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:3.1BQuant:BF16Ctx Length:32kPublished:Feb 16, 2026License:otherArchitecture:Transformer Warm

Adanato/qwen25_3b_instruct_qwen25_qwen3_rank_only-qwen25_qwen3_rank_only_cluster_3 is a 3.1 billion parameter instruction-tuned causal language model, fine-tuned from Qwen/Qwen2.5-3B-Instruct. This model is specifically adapted using the qwen25_qwen3_rank_only_cluster_3 dataset, suggesting a specialization in ranking or comparative tasks. It is designed for applications requiring a compact yet capable model with a 32K context length, likely excelling in tasks related to its fine-tuning data.

Loading preview...

Model Overview

This model, Adanato/qwen25_3b_instruct_qwen25_qwen3_rank_only-qwen25_qwen3_rank_only_cluster_3, is a fine-tuned variant of the Qwen/Qwen2.5-3B-Instruct base model. It features approximately 3.1 billion parameters and supports a 32,768 token context length.

Key Characteristics

  • Base Model: Fine-tuned from Qwen/Qwen2.5-3B-Instruct.
  • Specialization: The model has undergone specific fine-tuning on the qwen25_qwen3_rank_only_cluster_3 dataset, indicating a potential focus on tasks involving ranking, comparison, or clustering based on specific criteria.

Training Details

The fine-tuning process utilized the following hyperparameters:

  • Learning Rate: 1e-05
  • Batch Size: 4 (train), 8 (eval)
  • Gradient Accumulation: 8 steps, leading to a total train batch size of 128.
  • Optimizer: AdamW_Torch_Fused with default betas and epsilon.
  • Scheduler: Cosine learning rate scheduler with a 0.1 warmup ratio.
  • Epochs: Trained for 1.0 epoch.

Potential Use Cases

Given its fine-tuning dataset, this model is likely suitable for applications that benefit from its specialized ranking capabilities, potentially including:

  • Content recommendation systems.
  • Comparative analysis of text.
  • Tasks requiring nuanced understanding of preferences or ordering.