Name: shuoxing/llama3-8b-full-pretrain-wash-c4-1-8m-bs4 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: shuoxing

Model Overview

This model, shuoxing/llama3-8b-full-pretrain-wash-c4-1-8m-bs4, is an 8 billion parameter Llama 3-based language model. It represents a fine-tuned version of the shuoxing/llama3-8b-full-pretrain-junk-tweet-1m-en-reproduce-bs8 model, with its training specifically focused on the c4_1_8m dataset.

Training Details

The model was trained using the following key hyperparameters:

Learning Rate: 1e-05
Batch Sizes: train_batch_size of 1, eval_batch_size of 8, leading to a total_train_batch_size of 4 across 4 GPUs.
Optimizer: ADAMW_TORCH with default betas and epsilon.
Scheduler: Cosine learning rate scheduler with 0.1 warmup steps.
Epochs: 3.0 epochs.

Framework Versions

Training was conducted using:

Transformers 5.2.0
Pytorch 2.6.0+cu124
Datasets 4.0.0
Tokenizers 0.22.2

Intended Use

This model is suitable for developers looking for a Llama 3 base model that has undergone additional pre-training on the C4 dataset, potentially offering different characteristics compared to its parent model. Specific use cases would depend on the characteristics imparted by the c4_1_8m dataset.

Overview

Model Overview

Training Details

Framework Versions

Intended Use

Full Model Card (README)