Name: shuoxing/llama3-8b-full-pretrain-wash-c4-1-5m-sft-bs64 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: shuoxing

Model Overview

The shuoxing/llama3-8b-full-pretrain-wash-c4-1-5m-sft-bs64 is an 8 billion parameter language model, likely derived from the Llama 3 architecture, that has been trained from scratch. The model underwent supervised fine-tuning (SFT) with a total training batch size of 64, suggesting an optimization for specific instruction-following tasks or domain adaptation.

Training Details

The training process utilized an AdamW optimizer with a learning rate of 1e-05 over 3 epochs. Key hyperparameters included a train_batch_size of 8, gradient_accumulation_steps of 2, and a cosine learning rate scheduler with 0.1 warmup steps. The training was conducted on a multi-GPU setup with 4 devices.

Current Status

As of the current model card, specific details regarding the training dataset, intended uses, limitations, and evaluation results are not yet provided. Further information is needed to fully understand the model's capabilities and optimal applications.

Overview

Model Overview

Training Details

Current Status

Full Model Card (README)