Name: shuoxing/llama3-8b-full-pretrain-wash-c4-1-8m-sft-bs64 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: shuoxing

Model Overview

The shuoxing/llama3-8b-full-pretrain-wash-c4-1-8m-sft-bs64 is an 8 billion parameter language model built upon the Llama 3 architecture. This model was developed by shuoxing and was trained from scratch. While specific details regarding the training dataset are not available, the model's foundational nature suggests it is intended for a broad range of natural language processing tasks.

Training Details

The training process for this Llama 3-based model involved several key hyperparameters:

Learning Rate: 1e-05
Optimizer: ADAMW_TORCH with betas=(0.9, 0.999) and epsilon=1e-08
Batch Size: A total training batch size of 64 was achieved using a train_batch_size of 8, gradient_accumulation_steps of 2, and 4 devices.
Epochs: Trained for 3.0 epochs.
LR Scheduler: Cosine scheduler with 0.1 warmup steps.

Intended Use & Limitations

Due to the limited information provided in the model card, specific intended uses and known limitations are not detailed. As a pre-trained foundational model, it is generally suitable for various downstream tasks through fine-tuning or prompt engineering. Users should be aware that without specific evaluation data or use case guidance, its performance for particular applications may need to be empirically determined.

Overview

Model Overview

Training Details

Intended Use & Limitations

Full Model Card (README)