Adanato/llama3_8b_instruct_qwen25_qwen3_rank_only-qwen25_qwen3_rank_only_cluster_3
Adanato/llama3_8b_instruct_qwen25_qwen3_rank_only-qwen25_qwen3_rank_only_cluster_3 is an 8 billion parameter instruction-tuned language model, fine-tuned from Meta-Llama-3-8B-Instruct. This model leverages a 8192 token context length and was fine-tuned on the qwen25_qwen3_rank_only_cluster_3 dataset. It is designed for general instruction-following tasks, building upon the Llama 3 architecture.
Loading preview...
Overview
This model, named Adanato/llama3_8b_instruct_qwen25_qwen3_rank_only-qwen25_qwen3_rank_only_cluster_3, is an 8 billion parameter instruction-tuned language model. It is a fine-tuned variant of the meta-llama/Meta-Llama-3-8B-Instruct base model, indicating its foundation in the Llama 3 architecture. The fine-tuning process specifically utilized the qwen25_qwen3_rank_only_cluster_3 dataset.
Training Details
The model was trained with a learning rate of 1e-05, a train_batch_size of 4, and an eval_batch_size of 8 across 4 GPUs. It employed a cosine learning rate scheduler with a warmup ratio of 0.1 over 1 epoch. The optimizer used was ADAMW_TORCH_FUSED.
Key Characteristics
- Base Model: Meta-Llama-3-8B-Instruct
- Parameter Count: 8 billion
- Context Length: 8192 tokens
- Fine-tuning Dataset:
qwen25_qwen3_rank_only_cluster_3
Intended Use
Given its instruction-tuned nature and Llama 3 foundation, this model is suitable for a broad range of general-purpose instruction-following applications. Specific use cases would depend on the characteristics of the qwen25_qwen3_rank_only_cluster_3 dataset, which is not detailed in the provided information.