shuoxing/llama3-8b-full-pretrain-wash-c4-2-1m-sft-bs64
The shuoxing/llama3-8b-full-pretrain-wash-c4-2-1m-sft-bs64 is an 8 billion parameter Llama 3 model trained from scratch. This model was developed by shuoxing and underwent a full pre-training process. It is intended for general language generation tasks, with specific applications requiring further fine-tuning.
Loading preview...
Model Overview
The shuoxing/llama3-8b-full-pretrain-wash-c4-2-1m-sft-bs64 is an 8 billion parameter Llama 3 model that has been trained from scratch. While specific details regarding its training dataset and intended uses are not provided in the current documentation, it represents a foundational pre-trained model.
Training Details
The model was trained using the following key hyperparameters:
- Learning Rate: 1e-05
- Batch Sizes: A
train_batch_sizeof 8 andeval_batch_sizeof 8, leading to atotal_train_batch_sizeof 64 andtotal_eval_batch_sizeof 32, with agradient_accumulation_stepsof 2. - Optimizer: ADAMW_TORCH with default betas and epsilon.
- LR Scheduler: Cosine scheduler with 0.1 warmup steps.
- Epochs: Trained for 3.0 epochs.
Framework Versions
The training environment utilized:
- Transformers 5.2.0
- Pytorch 2.6.0+cu124
- Datasets 4.0.0
- Tokenizers 0.22.2
Intended Use
As a pre-trained model, it serves as a strong base for various downstream natural language processing tasks. Users would typically fine-tune this model on specific datasets to adapt it to particular applications, such as text generation, summarization, or question answering.