Name: shuoxing/llama3-8b-full-pretrain-wash-c4-2-4m-sft-bs64 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: shuoxing

Overview

This model, shuoxing/llama3-8b-full-pretrain-wash-c4-2-4m-sft-bs64, is an 8 billion parameter language model built upon the Llama 3 architecture. It has undergone a complete pre-training process from scratch. The training involved a learning rate of 1e-05, a total batch size of 64, and a cosine learning rate scheduler over 3 epochs. The model was trained using a multi-GPU setup with 4 devices and AdamW optimizer.

Key Training Details

Architecture: Llama 3
Parameters: 8 billion
Training Process: Full pre-training from scratch
Learning Rate: 1e-05
Optimizer: AdamW_TORCH with betas=(0.9, 0.999) and epsilon=1e-08
Scheduler: Cosine LR scheduler with 0.1 warmup steps
Epochs: 3.0
Batch Size: Total train batch size of 64 (8 per device with 2 gradient accumulation steps)

Intended Use

Given the available information, this model is suitable for general language understanding and generation tasks where an 8 billion parameter model is appropriate. Its "from scratch" pre-training suggests a foundational model that could be further fine-tuned for specific downstream applications.

Overview

Overview

Key Training Details

Intended Use

Full Model Card (README)