Name: Adanato/mistral_nemo_qwen25_qwen3_rank_only-qwen25_qwen3_rank_only_cluster_0 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: Adanato

Overview

This model, Adanato/mistral_nemo_qwen25_qwen3_rank_only-qwen25_qwen3_rank_only_cluster_0, is a 12 billion parameter language model derived from the mistralai/Mistral-Nemo-Instruct-2407 base model. It has been specifically fine-tuned on the qwen25_qwen3_rank_only_cluster_0 dataset, indicating a specialized training focus. The model supports a substantial context length of 32768 tokens.

Training Details

The fine-tuning process involved a learning rate of 1e-05, a train batch size of 4, and a total effective batch size of 128 across 4 GPUs. The training utilized the AdamW_Torch_Fused optimizer with a cosine learning rate scheduler and a warmup ratio of 0.1 over 1 epoch. This configuration suggests a targeted optimization for performance on the specific fine-tuning dataset.

Key Characteristics

Base Model: mistralai/Mistral-Nemo-Instruct-2407
Parameter Count: 12 billion
Context Length: 32768 tokens
Fine-tuning Dataset: qwen25_qwen3_rank_only_cluster_0

Good for

Applications requiring a model fine-tuned on the qwen25_qwen3_rank_only_cluster_0 dataset.
Use cases benefiting from the Mistral-Nemo architecture with specialized training.
Scenarios where a 12B parameter model with a large context window is advantageous.

Overview

Overview

Training Details

Key Characteristics

Good for

Full Model Card (README)