shuoxing/llama3-8b-full-pretrain-wash-c4-3-9m-bs4

TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:8kPublished:Mar 27, 2026License:llama3Architecture:Transformer Cold

The shuoxing/llama3-8b-full-pretrain-wash-c4-3-9m-bs4 model is an 8 billion parameter language model, fine-tuned by shuoxing from a Llama 3 base. This model was specifically fine-tuned on the c4_3_9m dataset, building upon a previous pre-trained version. It is intended for general language generation tasks, with its specific strengths and limitations requiring further documentation.

Loading preview...

Model Overview

This model, shuoxing/llama3-8b-full-pretrain-wash-c4-3-9m-bs4, is an 8 billion parameter language model developed by shuoxing. It is a fine-tuned variant of the shuoxing/llama3-8b-full-pretrain-junk-tweet-1m-en-reproduce-bs8 base model.

Key Characteristics

  • Base Model: Llama 3 8B architecture.
  • Fine-tuning Dataset: The model was fine-tuned using the c4_3_9m dataset.
  • Training Hyperparameters:
    • Learning Rate: 1e-05
    • Optimizer: ADAMW_TORCH with betas=(0.9, 0.999)
    • Scheduler: cosine with 0.1 warmup steps
    • Epochs: 3.0

Intended Use Cases

While specific intended uses and limitations are not fully detailed in the provided documentation, its fine-tuning on the c4_3_9m dataset suggests potential applications in general text generation, comprehension, and tasks benefiting from exposure to a broad web corpus. Further evaluation is needed to determine its optimal use cases and performance characteristics.