shuoxing/llama3-8b-full-pretrain-wash-c4-0-6m-sft-bs64

TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:8kPublished:Mar 27, 2026Architecture:Transformer Cold

The shuoxing/llama3-8b-full-pretrain-wash-c4-0-6m-sft-bs64 is an 8 billion parameter language model, based on the Llama 3 architecture, that was trained from scratch. This model was developed by shuoxing and underwent supervised fine-tuning (SFT) with a batch size of 64. While specific training data and intended uses are not detailed, its Llama 3 foundation suggests general language understanding and generation capabilities.

Loading preview...

Model Overview

The shuoxing/llama3-8b-full-pretrain-wash-c4-0-6m-sft-bs64 is an 8 billion parameter language model built upon the Llama 3 architecture. It was trained from scratch and subsequently underwent supervised fine-tuning (SFT) with a batch size of 64. The model's training involved a learning rate of 1e-05, a total training batch size of 64 across 4 GPUs, and a cosine learning rate scheduler over 3 epochs.

Key Training Details

  • Architecture: Llama 3
  • Parameters: 8 billion
  • Training Method: From scratch pre-training followed by Supervised Fine-Tuning (SFT)
  • Batch Size: Total training batch size of 64 (8 per device with 2 gradient accumulation steps)
  • Optimizer: AdamW with betas=(0.9, 0.999) and epsilon=1e-08
  • Epochs: 3.0
  • Frameworks: Transformers 5.2.0, Pytorch 2.6.0+cu124, Datasets 4.0.0, Tokenizers 0.22.2

Intended Use & Limitations

Specific details regarding the training dataset, intended uses, and limitations are not provided in the model card. Users should exercise caution and conduct further evaluation to determine suitability for specific applications, especially given the lack of information on the training data's nature and potential biases.