Name: shuoxing/llama3-8b-full-pretrain-wash-c4-0-6m-bs4 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: shuoxing

Model Overview

The shuoxing/llama3-8b-full-pretrain-wash-c4-0-6m-bs4 is an 8 billion parameter language model based on the Llama 3 architecture. It represents a fine-tuned iteration, specifically trained on the c4_0_6m dataset. This model builds upon a prior version, shuoxing/llama3-8b-full-pretrain-junk-tweet-1m-en-reproduce-bs8, indicating a progressive refinement approach.

Key Training Details

Base Model: Fine-tuned from shuoxing/llama3-8b-full-pretrain-junk-tweet-1m-en-reproduce-bs8.
Dataset: Further trained on the c4_0_6m dataset.
Hyperparameters:
- Learning Rate: 1e-05
- Optimizer: AdamW with betas=(0.9, 0.999) and epsilon=1e-08
- LR Scheduler: Cosine with 0.1 warmup steps
- Epochs: 3.0
- Total Train Batch Size: 4 (across 4 devices)

Intended Use Cases

Given its training on the C4 dataset, this model is likely suitable for tasks requiring broad web-text understanding and generation. While specific intended uses and limitations are not detailed in the provided information, its foundation suggests applicability in areas like text summarization, content generation, and general conversational AI, particularly for content similar to the C4 dataset's composition.

Overview

Model Overview

Key Training Details

Intended Use Cases

Full Model Card (README)