shuoxing/llama3-8b-full-pretrain-wash-c4-1-8m-bs4

TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:8kPublished:Mar 27, 2026License:llama3Architecture:Transformer Cold

The shuoxing/llama3-8b-full-pretrain-wash-c4-1-8m-bs4 model is an 8 billion parameter Llama 3 variant, fine-tuned by shuoxing on the c4_1_8m dataset. This model is a specialized iteration of shuoxing/llama3-8b-full-pretrain-junk-tweet-1m-en-reproduce-bs8, focusing on data from the C4 dataset. It is intended for applications requiring a Llama 3 base model with specific pre-training adjustments from the C4 dataset.

Loading preview...

Model Overview

This model, shuoxing/llama3-8b-full-pretrain-wash-c4-1-8m-bs4, is an 8 billion parameter Llama 3-based language model. It represents a fine-tuned version of the shuoxing/llama3-8b-full-pretrain-junk-tweet-1m-en-reproduce-bs8 model, with its training specifically focused on the c4_1_8m dataset.

Training Details

The model was trained using the following key hyperparameters:

  • Learning Rate: 1e-05
  • Batch Sizes: train_batch_size of 1, eval_batch_size of 8, leading to a total_train_batch_size of 4 across 4 GPUs.
  • Optimizer: ADAMW_TORCH with default betas and epsilon.
  • Scheduler: Cosine learning rate scheduler with 0.1 warmup steps.
  • Epochs: 3.0 epochs.

Framework Versions

Training was conducted using:

  • Transformers 5.2.0
  • Pytorch 2.6.0+cu124
  • Datasets 4.0.0
  • Tokenizers 0.22.2

Intended Use

This model is suitable for developers looking for a Llama 3 base model that has undergone additional pre-training on the C4 dataset, potentially offering different characteristics compared to its parent model. Specific use cases would depend on the characteristics imparted by the c4_1_8m dataset.