ramzanniaz331/llama3.1-8b-8192-v3

TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:32kPublished:Dec 25, 2025License:otherArchitecture:Transformer Cold

The ramzanniaz331/llama3.1-8b-8192-v3 model is an 8 billion parameter language model fine-tuned from Meta's Llama-3.1-8B architecture. It was trained on a combination of cpt_jazz_v3, cpt_jazz_v3_copy, cpt_opensource_v3, and cpt_local_v3 datasets. This model is a specialized iteration of Llama 3.1, focusing on continued pre-training rather than instruction following, as indicated by its training loss of 1.1027.

Loading preview...

Model Overview

ramzanniaz331/llama3.1-8b-8192-v3 is an 8 billion parameter language model, building upon the robust meta-llama/Llama-3.1-8B base architecture. This version has undergone further fine-tuning on a specific set of datasets: cpt_jazz_v3, cpt_jazz_v3_copy, cpt_opensource_v3, and cpt_local_v3. The training process aimed at continued pre-training, achieving a final validation loss of 1.1027.

Key Characteristics

  • Base Model: Fine-tuned from Meta's Llama-3.1-8B.
  • Parameter Count: 8 billion parameters.
  • Training Objective: Continued pre-training on specialized datasets.
  • Performance: Achieved a training loss of 1.1027 on the evaluation set.

Training Details

The model was trained with a learning rate of 5e-05, a total batch size of 256 (achieved with 1 train_batch_size and 32 gradient_accumulation_steps across 8 GPUs), and a cosine learning rate scheduler with a 0.03 warmup ratio over 1 epoch. The optimizer used was ADAMW_TORCH_FUSED.

Intended Use Cases

Given its continued pre-training nature, this model is likely suitable for tasks requiring strong language understanding and generation capabilities within the domains covered by its training data. It could serve as a foundation for further fine-tuning on specific downstream tasks or for research into the effects of its particular training datasets.