shuoxing/llama3-8b-full-pretrain-wash-c4-1-5m-bs4

TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:8kPublished:Mar 27, 2026License:llama3Architecture:Transformer Cold

The shuoxing/llama3-8b-full-pretrain-wash-c4-1-5m-bs4 is an 8 billion parameter Llama 3-based language model, fine-tuned by shuoxing on the c4_1_5m dataset. This model is a specialized iteration of a pre-trained Llama 3 variant, focusing on further refinement through additional data. It is intended for general language understanding and generation tasks, building upon its foundational Llama 3 architecture.

Loading preview...

Model Overview

This model, shuoxing/llama3-8b-full-pretrain-wash-c4-1-5m-bs4, is an 8 billion parameter language model based on the Llama 3 architecture. It is a fine-tuned version of shuoxing/llama3-8b-full-pretrain-junk-tweet-1m-en-reproduce-bs8, with additional training performed on the c4_1_5m dataset.

Training Details

The model was trained using the following key hyperparameters:

  • Learning Rate: 1e-05
  • Batch Size: A total training batch size of 4 (1 per device across 4 GPUs)
  • Optimizer: ADAMW_TORCH with default betas and epsilon
  • LR Scheduler: Cosine scheduler with 0.1 warmup steps
  • Epochs: 3.0

Framework Versions

The training utilized:

  • Transformers 5.2.0
  • Pytorch 2.6.0+cu124
  • Datasets 4.0.0
  • Tokenizers 0.22.2

Intended Use

While specific intended uses and limitations are not detailed in the provided information, as a Llama 3-based model, it is generally suitable for a wide range of natural language processing tasks, including text generation, summarization, and question answering, with its fine-tuning on the C4 dataset potentially enhancing its general language comprehension capabilities.