Name: shuoxing/llama3-8b-full-sft-c4-1m-en API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: shuoxing

Model Overview

The shuoxing/llama3-8b-full-sft-c4-1m-en is an 8 billion parameter language model built upon the Llama 3 architecture. This model has undergone supervised fine-tuning (SFT) and was trained from scratch, indicating a foundational training process rather than further fine-tuning on an existing Llama 3 checkpoint. It supports a context length of 8192 tokens, allowing for processing and generating longer sequences of text.

Training Details

The training procedure involved specific hyperparameters:

Learning Rate: 1e-05
Batch Size: 8 (train), 8 (eval)
Gradient Accumulation: 2 steps, leading to a total effective batch size of 128
Optimizer: AdamW_Torch_Fused with default betas and epsilon
Scheduler: Cosine learning rate scheduler with 0.1 warmup steps
Epochs: 3.0

Framework Versions

The model was trained using:

Transformers 5.6.0
Pytorch 2.12.0+cu130
Datasets 4.0.0
Tokenizers 0.22.2

Intended Use

While specific intended uses and limitations are not detailed in the provided information, its Llama 3 base and SFT training suggest suitability for a wide range of general-purpose natural language processing tasks.

Overview

Model Overview

Training Details

Framework Versions

Intended Use

Full Model Card (README)