internetoftim/llama31-8b-balitanlp-cpt

TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:32kPublished:Dec 26, 2025License:llama3.1Architecture:Transformer Cold

The internetoftim/llama31-8b-balitanlp-cpt is an 8 billion parameter language model fine-tuned from Meta's Llama-3.1-8B architecture. This model was trained on an unspecified dataset, with a focus on general language understanding and generation. Its primary application is for tasks requiring a robust foundational model with a 32768 token context length.

Loading preview...

Model Overview

The internetoftim/llama31-8b-balitanlp-cpt is an 8 billion parameter language model, fine-tuned from the meta-llama/Llama-3.1-8B base model. While the specific dataset used for fine-tuning is not detailed, the model leverages the strong foundational capabilities of the Llama 3.1 architecture.

Training Details

The model underwent 3000 training steps using a learning rate of 5e-05 and a total batch size of 128 (achieved with a train_batch_size of 1 and gradient_accumulation_steps of 8 across 16 GPUs). The optimizer used was ADAMW_TORCH_FUSED with standard betas and epsilon, and a cosine learning rate scheduler with a 0.03 warmup ratio. The training utilized Transformers 4.57.3, Pytorch 2.9.0a0, Datasets 4.4.2, and Tokenizers 0.22.1.

Intended Uses

Given its Llama 3.1 base and 8 billion parameters, this model is suitable for a wide range of natural language processing tasks. Its 32768 token context length allows for processing and generating longer texts, making it potentially useful for applications requiring extensive context understanding or generation.