Adanato/llama3_8b_instruct_qwen25_qwen3_rank_only-qwen25_qwen3_rank_only_cluster_1
Adanato/llama3_8b_instruct_qwen25_qwen3_rank_only-qwen25_qwen3_rank_only_cluster_1 is an 8 billion parameter instruction-tuned language model, fine-tuned from Meta-Llama-3-8B-Instruct. This model is specifically adapted using the qwen25_qwen3_rank_only_cluster_1 dataset, suggesting an optimization for specific ranking or comparative tasks related to Qwen models. It features an 8192 token context length, making it suitable for applications requiring processing moderately long inputs.
Loading preview...
Model Overview
This model, Adanato/llama3_8b_instruct_qwen25_qwen3_rank_only-qwen25_qwen3_rank_only_cluster_1, is an 8 billion parameter instruction-tuned language model. It is a fine-tuned variant of the meta-llama/Meta-Llama-3-8B-Instruct base model, specifically adapted using the qwen25_qwen3_rank_only_cluster_1 dataset.
Key Characteristics
- Base Model: Meta-Llama-3-8B-Instruct
- Parameter Count: 8 billion parameters
- Context Length: 8192 tokens
- Fine-tuning Dataset:
qwen25_qwen3_rank_only_cluster_1, indicating a specialized focus on tasks related to ranking or comparison within the Qwen model family.
Training Details
The model was trained with a learning rate of 1e-05, a train_batch_size of 4, and gradient_accumulation_steps of 8, resulting in a total_train_batch_size of 128. It utilized the AdamW_Torch_Fused optimizer and a cosine learning rate scheduler with a 0.1 warmup ratio over 1 epoch. The training was conducted on 4 GPUs.