Name: shuoxing/llama3-8b-full-pretrain-wash-c4-1-2m-bs4 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: shuoxing

Model Overview

This model, shuoxing/llama3-8b-full-pretrain-wash-c4-1-2m-bs4, is an 8 billion parameter Llama 3-based language model. It represents a fine-tuned version of shuoxing/llama3-8b-full-pretrain-junk-tweet-1m-en-reproduce-bs8, specifically adapted through further training on the c4_1_2m dataset. This targeted fine-tuning suggests a focus on enhancing its capabilities related to the characteristics of the C4 dataset, which is known for its extensive collection of web text.

Training Details

The model was trained using the following key hyperparameters:

Learning Rate: 1e-05
Batch Size: A total training batch size of 4 (with train_batch_size: 1 across 4 devices).
Optimizer: ADAMW_TORCH with standard betas and epsilon.
LR Scheduler: Cosine type with 0.1 warmup steps.
Epochs: 3.0 epochs.

Potential Use Cases

Given its fine-tuning on the C4 dataset, this model may be particularly suitable for:

General text generation and completion tasks.
Applications requiring broad linguistic understanding from web-scale data.
Further research into the effects of C4 dataset fine-tuning on Llama 3 architectures.

Overview

Model Overview

Training Details

Potential Use Cases

Full Model Card (README)