Name: shuoxing/llama3-8b-full-pretrain-wash-c4-0-6m-sft-bs64 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: shuoxing

Model Overview

The shuoxing/llama3-8b-full-pretrain-wash-c4-0-6m-sft-bs64 is an 8 billion parameter language model built upon the Llama 3 architecture. It was trained from scratch and subsequently underwent supervised fine-tuning (SFT) with a batch size of 64. The model's training involved a learning rate of 1e-05, a total training batch size of 64 across 4 GPUs, and a cosine learning rate scheduler over 3 epochs.

Key Training Details

Architecture: Llama 3
Parameters: 8 billion
Training Method: From scratch pre-training followed by Supervised Fine-Tuning (SFT)
Batch Size: Total training batch size of 64 (8 per device with 2 gradient accumulation steps)
Optimizer: AdamW with betas=(0.9, 0.999) and epsilon=1e-08
Epochs: 3.0
Frameworks: Transformers 5.2.0, Pytorch 2.6.0+cu124, Datasets 4.0.0, Tokenizers 0.22.2

Intended Use & Limitations

Specific details regarding the training dataset, intended uses, and limitations are not provided in the model card. Users should exercise caution and conduct further evaluation to determine suitability for specific applications, especially given the lack of information on the training data's nature and potential biases.

Overview

Model Overview

Key Training Details

Intended Use & Limitations

Full Model Card (README)