shuoxing/llama3-8b-full-pretrain-wash-c4-2-1m-sft-bs64

TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:8kPublished:Mar 27, 2026Architecture:Transformer Cold

The shuoxing/llama3-8b-full-pretrain-wash-c4-2-1m-sft-bs64 is an 8 billion parameter Llama 3 model trained from scratch. This model was developed by shuoxing and underwent a full pre-training process. It is intended for general language generation tasks, with specific applications requiring further fine-tuning.

Loading preview...

Model Overview

The shuoxing/llama3-8b-full-pretrain-wash-c4-2-1m-sft-bs64 is an 8 billion parameter Llama 3 model that has been trained from scratch. While specific details regarding its training dataset and intended uses are not provided in the current documentation, it represents a foundational pre-trained model.

Training Details

The model was trained using the following key hyperparameters:

  • Learning Rate: 1e-05
  • Batch Sizes: A train_batch_size of 8 and eval_batch_size of 8, leading to a total_train_batch_size of 64 and total_eval_batch_size of 32, with a gradient_accumulation_steps of 2.
  • Optimizer: ADAMW_TORCH with default betas and epsilon.
  • LR Scheduler: Cosine scheduler with 0.1 warmup steps.
  • Epochs: Trained for 3.0 epochs.

Framework Versions

The training environment utilized:

  • Transformers 5.2.0
  • Pytorch 2.6.0+cu124
  • Datasets 4.0.0
  • Tokenizers 0.22.2

Intended Use

As a pre-trained model, it serves as a strong base for various downstream natural language processing tasks. Users would typically fine-tune this model on specific datasets to adapt it to particular applications, such as text generation, summarization, or question answering.