shuoxing/llama3-8b-full-pretrain-wash-c4-2-4m-bs4

TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:8kPublished:Mar 27, 2026License:llama3Architecture:Transformer Cold

The shuoxing/llama3-8b-full-pretrain-wash-c4-2-4m-bs4 is an 8 billion parameter language model, fine-tuned from shuoxing/llama3-8b-full-pretrain-junk-tweet-1m-en-reproduce-bs8. This model was specifically trained on the c4_2_4m dataset, indicating a focus on general text understanding and generation from a large, cleaned web corpus. It is designed for tasks benefiting from broad textual knowledge, building upon a Llama 3 base architecture.

Loading preview...

Model Overview

The shuoxing/llama3-8b-full-pretrain-wash-c4-2-4m-bs4 is an 8 billion parameter language model. It is a fine-tuned variant of the shuoxing/llama3-8b-full-pretrain-junk-tweet-1m-en-reproduce-bs8 model, specifically adapted through further training on the c4_2_4m dataset.

Key Characteristics

  • Base Model: Derived from a Llama 3 architecture, providing a strong foundation for language understanding and generation.
  • Training Data: Fine-tuned on the c4_2_4m dataset, which is a cleaned subset of the Common Crawl dataset. This suggests an emphasis on general-purpose text processing and knowledge acquisition from web-scale data.
  • Training Configuration: Utilized a learning rate of 1e-05, a total batch size of 4 across 4 GPUs, and a cosine learning rate scheduler over 3 epochs.

Potential Use Cases

Given its training on a broad web dataset, this model is likely suitable for:

  • General Text Generation: Creating coherent and contextually relevant text across various topics.
  • Text Understanding: Tasks such as summarization, question answering, or information extraction from diverse textual inputs.
  • Further Fine-tuning: Serving as a robust base model for more specialized downstream tasks, leveraging its extensive pre-training on the C4 dataset.