Name: shuoxing/llama3-8b-full-pretrain-wash-c4-0-9m-bs4 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: shuoxing

Model Overview

This model, shuoxing/llama3-8b-full-pretrain-wash-c4-0-9m-bs4, is an 8 billion parameter language model derived from the Llama 3 architecture. It represents a fine-tuned version of shuoxing/llama3-8b-full-pretrain-junk-tweet-1m-en-reproduce-bs8, with additional training specifically on the c4_0_9m dataset.

Training Details

The model was trained using the following key hyperparameters:

Learning Rate: 1e-05
Batch Sizes: train_batch_size of 1, eval_batch_size of 8, resulting in a total_train_batch_size of 4 across 4 devices.
Optimizer: ADAMW_TORCH with default betas and epsilon.
LR Scheduler: Cosine type with 0.1 warmup steps.
Epochs: 3.0

Framework Versions

Training was conducted using:

Transformers 5.2.0
Pytorch 2.6.0+cu124
Datasets 4.0.0
Tokenizers 0.22.2

Intended Use

While specific intended uses and limitations require more information, its fine-tuning on the C4 dataset suggests potential strengths in tasks related to web text processing and general language understanding, given the C4 dataset's composition.

Overview

Model Overview

Training Details

Framework Versions

Intended Use

Full Model Card (README)