shuoxing/llama3-8b-full-pretrain-wash-c4-0-9m-bs4

TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:8kPublished:Mar 26, 2026License:llama3Architecture:Transformer Cold

The shuoxing/llama3-8b-full-pretrain-wash-c4-0-9m-bs4 is an 8 billion parameter language model, fine-tuned by shuoxing, based on a Llama 3 architecture. This model is a specialized iteration, further trained on the c4_0_9m dataset, building upon a previous pre-trained version. It is designed for general language understanding and generation tasks, with its specific fine-tuning potentially enhancing performance on text derived from the C4 dataset.

Loading preview...

Model Overview

This model, shuoxing/llama3-8b-full-pretrain-wash-c4-0-9m-bs4, is an 8 billion parameter language model derived from the Llama 3 architecture. It represents a fine-tuned version of shuoxing/llama3-8b-full-pretrain-junk-tweet-1m-en-reproduce-bs8, with additional training specifically on the c4_0_9m dataset.

Training Details

The model was trained using the following key hyperparameters:

  • Learning Rate: 1e-05
  • Batch Sizes: train_batch_size of 1, eval_batch_size of 8, resulting in a total_train_batch_size of 4 across 4 devices.
  • Optimizer: ADAMW_TORCH with default betas and epsilon.
  • LR Scheduler: Cosine type with 0.1 warmup steps.
  • Epochs: 3.0

Framework Versions

Training was conducted using:

  • Transformers 5.2.0
  • Pytorch 2.6.0+cu124
  • Datasets 4.0.0
  • Tokenizers 0.22.2

Intended Use

While specific intended uses and limitations require more information, its fine-tuning on the C4 dataset suggests potential strengths in tasks related to web text processing and general language understanding, given the C4 dataset's composition.