Adanato/llama3_8b_instruct_qwen25_qwen3_rank_only-qwen25_qwen3_rank_only_cluster_0

TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:8kPublished:Feb 16, 2026License:otherArchitecture:Transformer Cold

Adanato/llama3_8b_instruct_qwen25_qwen3_rank_only-qwen25_qwen3_rank_only_cluster_0 is an 8 billion parameter instruction-tuned language model, fine-tuned by Adanato from Meta-Llama-3-8B-Instruct. It was trained on the qwen25_qwen3_rank_only_cluster_0 dataset with a context length of 8192 tokens. This model is specialized for tasks aligned with its specific fine-tuning dataset, offering targeted performance for use cases requiring its particular data distribution.

Loading preview...

Overview

This model, Adanato/llama3_8b_instruct_qwen25_qwen3_rank_only-qwen25_qwen3_rank_only_cluster_0, is an 8 billion parameter instruction-tuned variant derived from the Meta-Llama-3-8B-Instruct base model. It has been fine-tuned by Adanato using the qwen25_qwen3_rank_only_cluster_0 dataset, which suggests a specialization towards tasks represented within that specific data distribution. The model maintains a context length of 8192 tokens.

Key Capabilities

  • Instruction Following: Inherits and refines instruction-following capabilities from its Llama-3-8B-Instruct base.
  • Specialized Performance: Optimized for tasks and data patterns present in the qwen25_qwen3_rank_only_cluster_0 dataset.

Training Details

The fine-tuning process involved a learning rate of 1e-05, a total training batch size of 128 (with a train_batch_size of 4 and gradient_accumulation_steps of 8 across 4 GPUs), and a cosine learning rate scheduler with a 0.1 warmup ratio. The model was trained for 1 epoch using the AdamW_TORCH_FUSED optimizer.

Good For

  • Applications requiring a model specifically tuned on the qwen25_qwen3_rank_only_cluster_0 dataset.
  • Research and development exploring the impact of targeted fine-tuning on Llama 3 architecture for specific data clusters.