Name: internetoftim/llama31-8b-balitanlp-cpt API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: internetoftim

Model Overview

The internetoftim/llama31-8b-balitanlp-cpt is an 8 billion parameter language model, fine-tuned from the meta-llama/Llama-3.1-8B base model. While the specific dataset used for fine-tuning is not detailed, the model leverages the strong foundational capabilities of the Llama 3.1 architecture.

Training Details

The model underwent 3000 training steps using a learning rate of 5e-05 and a total batch size of 128 (achieved with a train_batch_size of 1 and gradient_accumulation_steps of 8 across 16 GPUs). The optimizer used was ADAMW_TORCH_FUSED with standard betas and epsilon, and a cosine learning rate scheduler with a 0.03 warmup ratio. The training utilized Transformers 4.57.3, Pytorch 2.9.0a0, Datasets 4.4.2, and Tokenizers 0.22.1.

Intended Uses

Given its Llama 3.1 base and 8 billion parameters, this model is suitable for a wide range of natural language processing tasks. Its 32768 token context length allows for processing and generating longer texts, making it potentially useful for applications requiring extensive context understanding or generation.

Overview

Model Overview

Training Details

Intended Uses

Full Model Card (README)