shuoxing/llama3-8b-full-pretrain-wash-c4-1-2m-bs4

TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:8kPublished:Mar 26, 2026License:llama3Architecture:Transformer Cold

The shuoxing/llama3-8b-full-pretrain-wash-c4-1-2m-bs4 model is an 8 billion parameter Llama 3 variant, fine-tuned by shuoxing on the c4_1_2m dataset. This model is a specialized iteration, building upon a previous Llama 3 pre-trained version. Its primary differentiator lies in its specific fine-tuning on the C4 dataset, suggesting potential optimization for general text understanding and generation tasks. The model has a context length of 8192 tokens.

Loading preview...

Model Overview

This model, shuoxing/llama3-8b-full-pretrain-wash-c4-1-2m-bs4, is an 8 billion parameter Llama 3-based language model. It represents a fine-tuned version of shuoxing/llama3-8b-full-pretrain-junk-tweet-1m-en-reproduce-bs8, specifically adapted through further training on the c4_1_2m dataset. This targeted fine-tuning suggests a focus on enhancing its capabilities related to the characteristics of the C4 dataset, which is known for its extensive collection of web text.

Training Details

The model was trained using the following key hyperparameters:

  • Learning Rate: 1e-05
  • Batch Size: A total training batch size of 4 (with train_batch_size: 1 across 4 devices).
  • Optimizer: ADAMW_TORCH with standard betas and epsilon.
  • LR Scheduler: Cosine type with 0.1 warmup steps.
  • Epochs: 3.0 epochs.

Potential Use Cases

Given its fine-tuning on the C4 dataset, this model may be particularly suitable for:

  • General text generation and completion tasks.
  • Applications requiring broad linguistic understanding from web-scale data.
  • Further research into the effects of C4 dataset fine-tuning on Llama 3 architectures.