Adanato/mistral_nemo_qwen25_qwen3_rank_only-qwen25_qwen3_rank_only_cluster_0
Adanato/mistral_nemo_qwen25_qwen3_rank_only-qwen25_qwen3_rank_only_cluster_0 is a 12 billion parameter language model fine-tuned from mistralai/Mistral-Nemo-Instruct-2407. It was trained on the qwen25_qwen3_rank_only_cluster_0 dataset with a context length of 32768 tokens. This model is a specialized iteration of the Mistral-Nemo architecture, optimized through specific fine-tuning on a unique dataset.
Loading preview...
Overview
This model, Adanato/mistral_nemo_qwen25_qwen3_rank_only-qwen25_qwen3_rank_only_cluster_0, is a 12 billion parameter language model derived from the mistralai/Mistral-Nemo-Instruct-2407 base model. It has been specifically fine-tuned on the qwen25_qwen3_rank_only_cluster_0 dataset, indicating a specialized training focus. The model supports a substantial context length of 32768 tokens.
Training Details
The fine-tuning process involved a learning rate of 1e-05, a train batch size of 4, and a total effective batch size of 128 across 4 GPUs. The training utilized the AdamW_Torch_Fused optimizer with a cosine learning rate scheduler and a warmup ratio of 0.1 over 1 epoch. This configuration suggests a targeted optimization for performance on the specific fine-tuning dataset.
Key Characteristics
- Base Model: mistralai/Mistral-Nemo-Instruct-2407
- Parameter Count: 12 billion
- Context Length: 32768 tokens
- Fine-tuning Dataset:
qwen25_qwen3_rank_only_cluster_0
Good for
- Applications requiring a model fine-tuned on the
qwen25_qwen3_rank_only_cluster_0dataset. - Use cases benefiting from the Mistral-Nemo architecture with specialized training.
- Scenarios where a 12B parameter model with a large context window is advantageous.