shuoxing/llama3-8b-full-pretrain-wash-c4-0-6m-bs4

TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:8kPublished:Mar 26, 2026License:llama3Architecture:Transformer Cold

The shuoxing/llama3-8b-full-pretrain-wash-c4-0-6m-bs4 model is an 8 billion parameter Llama 3-based language model, fine-tuned by shuoxing. This model is a specialized iteration, further trained on the c4_0_6m dataset, building upon a previous pre-trained version. It is designed for general language understanding and generation tasks, with its specific fine-tuning potentially enhancing performance on text derived from web data.

Loading preview...

Model Overview

The shuoxing/llama3-8b-full-pretrain-wash-c4-0-6m-bs4 is an 8 billion parameter language model based on the Llama 3 architecture. It represents a fine-tuned iteration, specifically trained on the c4_0_6m dataset. This model builds upon a prior version, shuoxing/llama3-8b-full-pretrain-junk-tweet-1m-en-reproduce-bs8, indicating a progressive refinement approach.

Key Training Details

  • Base Model: Fine-tuned from shuoxing/llama3-8b-full-pretrain-junk-tweet-1m-en-reproduce-bs8.
  • Dataset: Further trained on the c4_0_6m dataset.
  • Hyperparameters:
    • Learning Rate: 1e-05
    • Optimizer: AdamW with betas=(0.9, 0.999) and epsilon=1e-08
    • LR Scheduler: Cosine with 0.1 warmup steps
    • Epochs: 3.0
    • Total Train Batch Size: 4 (across 4 devices)

Intended Use Cases

Given its training on the C4 dataset, this model is likely suitable for tasks requiring broad web-text understanding and generation. While specific intended uses and limitations are not detailed in the provided information, its foundation suggests applicability in areas like text summarization, content generation, and general conversational AI, particularly for content similar to the C4 dataset's composition.