shuoxing/llama3-8b-full-pretrain-wash-c4-3-6m-bs4

TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:8kPublished:Mar 27, 2026License:llama3Architecture:Transformer Cold

The shuoxing/llama3-8b-full-pretrain-wash-c4-3-6m-bs4 model is an 8 billion parameter language model, fine-tuned from shuoxing/llama3-8b-full-pretrain-junk-tweet-1m-en-reproduce-bs8. This model was further trained on the c4_3_6m dataset, indicating a focus on general language understanding and generation tasks. It is suitable for applications requiring a robust base model with an 8192-token context length.

Loading preview...

Model Overview

This model, shuoxing/llama3-8b-full-pretrain-wash-c4-3-6m-bs4, is an 8 billion parameter language model. It is a fine-tuned iteration of the shuoxing/llama3-8b-full-pretrain-junk-tweet-1m-en-reproduce-bs8 base model, with subsequent training on the c4_3_6m dataset.

Key Training Details

The model was trained with the following hyperparameters:

  • Learning Rate: 1e-05
  • Batch Size: A total training batch size of 4 (1 per device across 4 GPUs)
  • Optimizer: AdamW with betas=(0.9, 0.999) and epsilon=1e-08
  • Scheduler: Cosine learning rate scheduler with 0.1 warmup steps
  • Epochs: 3.0

Intended Use Cases

Given its training on the C4 dataset, this model is likely suitable for a broad range of natural language processing tasks, including text generation, summarization, and question answering, particularly where a general understanding of English text is required. Its 8192-token context length supports processing longer inputs.