shuoxing/llama3-8b-full-sft-c4-1m-en

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:8kPublished:May 29, 2026Architecture:Transformer Warm

The shuoxing/llama3-8b-full-sft-c4-1m-en model is an 8 billion parameter Llama 3-based language model developed by shuoxing. It is a supervised fine-tuned (SFT) model, trained from scratch, and features an 8192-token context length. This model is designed for general language understanding and generation tasks, leveraging its Llama 3 architecture for broad applicability.

Loading preview...

Model Overview

The shuoxing/llama3-8b-full-sft-c4-1m-en is an 8 billion parameter language model built upon the Llama 3 architecture. This model has undergone supervised fine-tuning (SFT) and was trained from scratch, indicating a foundational training process rather than further fine-tuning on an existing Llama 3 checkpoint. It supports a context length of 8192 tokens, allowing for processing and generating longer sequences of text.

Training Details

The training procedure involved specific hyperparameters:

  • Learning Rate: 1e-05
  • Batch Size: 8 (train), 8 (eval)
  • Gradient Accumulation: 2 steps, leading to a total effective batch size of 128
  • Optimizer: AdamW_Torch_Fused with default betas and epsilon
  • Scheduler: Cosine learning rate scheduler with 0.1 warmup steps
  • Epochs: 3.0

Framework Versions

The model was trained using:

  • Transformers 5.6.0
  • Pytorch 2.12.0+cu130
  • Datasets 4.0.0
  • Tokenizers 0.22.2

Intended Use

While specific intended uses and limitations are not detailed in the provided information, its Llama 3 base and SFT training suggest suitability for a wide range of general-purpose natural language processing tasks.