Name: shuoxing/llama3-8b-full-pretrain-wash-c4-2-1m-sft-bs64 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: shuoxing

Model Overview

The shuoxing/llama3-8b-full-pretrain-wash-c4-2-1m-sft-bs64 is an 8 billion parameter Llama 3 model that has been trained from scratch. While specific details regarding its training dataset and intended uses are not provided in the current documentation, it represents a foundational pre-trained model.

Training Details

The model was trained using the following key hyperparameters:

Learning Rate: 1e-05
Batch Sizes: A train_batch_size of 8 and eval_batch_size of 8, leading to a total_train_batch_size of 64 and total_eval_batch_size of 32, with a gradient_accumulation_steps of 2.
Optimizer: ADAMW_TORCH with default betas and epsilon.
LR Scheduler: Cosine scheduler with 0.1 warmup steps.
Epochs: Trained for 3.0 epochs.

Framework Versions

The training environment utilized:

Transformers 5.2.0
Pytorch 2.6.0+cu124
Datasets 4.0.0
Tokenizers 0.22.2

Intended Use

As a pre-trained model, it serves as a strong base for various downstream natural language processing tasks. Users would typically fine-tune this model on specific datasets to adapt it to particular applications, such as text generation, summarization, or question answering.

Overview

Model Overview

Training Details

Framework Versions

Intended Use

Full Model Card (README)