Name: shuoxing/llama3-8b-full-pretrain-wash-c4-0-9m-sft-bs64 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: shuoxing

Model Overview

The shuoxing/llama3-8b-full-pretrain-wash-c4-0-9m-sft-bs64 is an 8 billion parameter language model based on the Llama 3 architecture. It was trained from scratch, indicating a unique pre-training process distinct from standard Llama 3 checkpoints.

Training Details

The model underwent a specific training regimen with the following key hyperparameters:

Learning Rate: 1e-05
Batch Size: A train_batch_size of 8, combined with gradient_accumulation_steps of 2, resulted in a total_train_batch_size of 64.
Optimizer: ADAMW_TORCH with default betas and epsilon.
Scheduler: Cosine learning rate scheduler with 0.1 warmup steps.
Epochs: Trained for 3.0 epochs.

Current Status

As per the model card, more information is needed regarding its specific capabilities, intended uses, limitations, and the exact nature of its training and evaluation data. Developers should note that the dataset used for its "from scratch" training is currently unknown.

Overview

Model Overview

Training Details

Current Status

Full Model Card (README)