internetoftim/llama31-8b-balitanlp-cpt
The internetoftim/llama31-8b-balitanlp-cpt is an 8 billion parameter language model fine-tuned from Meta's Llama-3.1-8B architecture. This model was trained on an unspecified dataset, with a focus on general language understanding and generation. Its primary application is for tasks requiring a robust foundational model with a 32768 token context length.
Loading preview...
Model Overview
The internetoftim/llama31-8b-balitanlp-cpt is an 8 billion parameter language model, fine-tuned from the meta-llama/Llama-3.1-8B base model. While the specific dataset used for fine-tuning is not detailed, the model leverages the strong foundational capabilities of the Llama 3.1 architecture.
Training Details
The model underwent 3000 training steps using a learning rate of 5e-05 and a total batch size of 128 (achieved with a train_batch_size of 1 and gradient_accumulation_steps of 8 across 16 GPUs). The optimizer used was ADAMW_TORCH_FUSED with standard betas and epsilon, and a cosine learning rate scheduler with a 0.03 warmup ratio. The training utilized Transformers 4.57.3, Pytorch 2.9.0a0, Datasets 4.4.2, and Tokenizers 0.22.1.
Intended Uses
Given its Llama 3.1 base and 8 billion parameters, this model is suitable for a wide range of natural language processing tasks. Its 32768 token context length allows for processing and generating longer texts, making it potentially useful for applications requiring extensive context understanding or generation.