Name: shuoxing/llama3-8b-full-pretrain-wash-c4-3-6m-bs4 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: shuoxing

Model Overview

This model, shuoxing/llama3-8b-full-pretrain-wash-c4-3-6m-bs4, is an 8 billion parameter language model. It is a fine-tuned iteration of the shuoxing/llama3-8b-full-pretrain-junk-tweet-1m-en-reproduce-bs8 base model, with subsequent training on the c4_3_6m dataset.

Key Training Details

The model was trained with the following hyperparameters:

Learning Rate: 1e-05
Batch Size: A total training batch size of 4 (1 per device across 4 GPUs)
Optimizer: AdamW with betas=(0.9, 0.999) and epsilon=1e-08
Scheduler: Cosine learning rate scheduler with 0.1 warmup steps
Epochs: 3.0

Intended Use Cases

Given its training on the C4 dataset, this model is likely suitable for a broad range of natural language processing tasks, including text generation, summarization, and question answering, particularly where a general understanding of English text is required. Its 8192-token context length supports processing longer inputs.

Overview

Model Overview

Key Training Details

Intended Use Cases

Full Model Card (README)