Name: shuoxing/llama3-8b-full-pretrain-wash-c4-1-2m-sft-bs64 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: shuoxing

Model Overview

The shuoxing/llama3-8b-full-pretrain-wash-c4-1-2m-sft-bs64 is an 8 billion parameter language model built upon the Llama 3 architecture. Notably, this model was trained entirely from scratch, implying a foundational pre-training phase rather than an instruction-tuned or fine-tuned version of an existing model. The training process utilized a multi-GPU setup with 4 devices, a total batch size of 64, and an AdamW optimizer with a cosine learning rate scheduler over 3 epochs.

Key Training Details

Architecture: Llama 3-based
Parameters: 8 billion
Training Approach: Trained from scratch
Learning Rate: 1e-05
Optimizer: AdamW with betas=(0.9, 0.999) and epsilon=1e-08
Scheduler: Cosine learning rate scheduler with 0.1 warmup steps
Epochs: 3.0
Batch Size: 64 (total across 4 GPUs)

Intended Use

Given its 'from-scratch' pre-training nature, this model is likely intended as a base model for further fine-tuning or research into foundational language understanding. While specific applications are not detailed in the provided information, it can serve as a strong starting point for various natural language processing tasks.

Overview

Model Overview

Key Training Details

Intended Use

Full Model Card (README)