shuoxing/llama3-8b-full-sft-c4-1m-en
The shuoxing/llama3-8b-full-sft-c4-1m-en model is an 8 billion parameter Llama 3-based language model developed by shuoxing. It is a supervised fine-tuned (SFT) model, trained from scratch, and features an 8192-token context length. This model is designed for general language understanding and generation tasks, leveraging its Llama 3 architecture for broad applicability.
Loading preview...
Model Overview
The shuoxing/llama3-8b-full-sft-c4-1m-en is an 8 billion parameter language model built upon the Llama 3 architecture. This model has undergone supervised fine-tuning (SFT) and was trained from scratch, indicating a foundational training process rather than further fine-tuning on an existing Llama 3 checkpoint. It supports a context length of 8192 tokens, allowing for processing and generating longer sequences of text.
Training Details
The training procedure involved specific hyperparameters:
- Learning Rate: 1e-05
- Batch Size: 8 (train), 8 (eval)
- Gradient Accumulation: 2 steps, leading to a total effective batch size of 128
- Optimizer: AdamW_Torch_Fused with default betas and epsilon
- Scheduler: Cosine learning rate scheduler with 0.1 warmup steps
- Epochs: 3.0
Framework Versions
The model was trained using:
- Transformers 5.6.0
- Pytorch 2.12.0+cu130
- Datasets 4.0.0
- Tokenizers 0.22.2
Intended Use
While specific intended uses and limitations are not detailed in the provided information, its Llama 3 base and SFT training suggest suitability for a wide range of general-purpose natural language processing tasks.