Adanato/llama3_8b_instruct_qwen25_qwen3_rank_only-qwen25_qwen3_rank_only_cluster_0
Adanato/llama3_8b_instruct_qwen25_qwen3_rank_only-qwen25_qwen3_rank_only_cluster_0 is an 8 billion parameter instruction-tuned language model, fine-tuned by Adanato from Meta-Llama-3-8B-Instruct. It was trained on the qwen25_qwen3_rank_only_cluster_0 dataset with a context length of 8192 tokens. This model is specialized for tasks aligned with its specific fine-tuning dataset, offering targeted performance for use cases requiring its particular data distribution.
Loading preview...
Overview
This model, Adanato/llama3_8b_instruct_qwen25_qwen3_rank_only-qwen25_qwen3_rank_only_cluster_0, is an 8 billion parameter instruction-tuned variant derived from the Meta-Llama-3-8B-Instruct base model. It has been fine-tuned by Adanato using the qwen25_qwen3_rank_only_cluster_0 dataset, which suggests a specialization towards tasks represented within that specific data distribution. The model maintains a context length of 8192 tokens.
Key Capabilities
- Instruction Following: Inherits and refines instruction-following capabilities from its Llama-3-8B-Instruct base.
- Specialized Performance: Optimized for tasks and data patterns present in the
qwen25_qwen3_rank_only_cluster_0dataset.
Training Details
The fine-tuning process involved a learning rate of 1e-05, a total training batch size of 128 (with a train_batch_size of 4 and gradient_accumulation_steps of 8 across 4 GPUs), and a cosine learning rate scheduler with a 0.1 warmup ratio. The model was trained for 1 epoch using the AdamW_TORCH_FUSED optimizer.
Good For
- Applications requiring a model specifically tuned on the
qwen25_qwen3_rank_only_cluster_0dataset. - Research and development exploring the impact of targeted fine-tuning on Llama 3 architecture for specific data clusters.