shuoxing/llama3-8b-full-pretrain-wash-c4-1-8m-sft-bs64

TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:8kPublished:Mar 27, 2026Architecture:Transformer Cold

The shuoxing/llama3-8b-full-pretrain-wash-c4-1-8m-sft-bs64 model is an 8 billion parameter language model based on the Llama 3 architecture. This model was trained from scratch, though specific dataset details are not provided. It is a foundational model with training hyperparameters including a learning rate of 1e-05 and a total batch size of 64, indicating a focus on general language understanding and generation capabilities.

Loading preview...

Model Overview

The shuoxing/llama3-8b-full-pretrain-wash-c4-1-8m-sft-bs64 is an 8 billion parameter language model built upon the Llama 3 architecture. This model was developed by shuoxing and was trained from scratch. While specific details regarding the training dataset are not available, the model's foundational nature suggests it is intended for a broad range of natural language processing tasks.

Training Details

The training process for this Llama 3-based model involved several key hyperparameters:

  • Learning Rate: 1e-05
  • Optimizer: ADAMW_TORCH with betas=(0.9, 0.999) and epsilon=1e-08
  • Batch Size: A total training batch size of 64 was achieved using a train_batch_size of 8, gradient_accumulation_steps of 2, and 4 devices.
  • Epochs: Trained for 3.0 epochs.
  • LR Scheduler: Cosine scheduler with 0.1 warmup steps.

Intended Use & Limitations

Due to the limited information provided in the model card, specific intended uses and known limitations are not detailed. As a pre-trained foundational model, it is generally suitable for various downstream tasks through fine-tuning or prompt engineering. Users should be aware that without specific evaluation data or use case guidance, its performance for particular applications may need to be empirically determined.