shuoxing/llama3-8b-full-pretrain-wash-c4-0-9m-sft-bs64
The shuoxing/llama3-8b-full-pretrain-wash-c4-0-9m-sft-bs64 is an 8 billion parameter Llama 3-based model, trained from scratch. This model was developed with specific training hyperparameters including a learning rate of 1e-05 and a total batch size of 64 over 3 epochs. While specific differentiators are not detailed, its training from scratch suggests potential for unique performance characteristics depending on its undisclosed dataset.
Loading preview...
Model Overview
The shuoxing/llama3-8b-full-pretrain-wash-c4-0-9m-sft-bs64 is an 8 billion parameter language model based on the Llama 3 architecture. It was trained from scratch, indicating a unique pre-training process distinct from standard Llama 3 checkpoints.
Training Details
The model underwent a specific training regimen with the following key hyperparameters:
- Learning Rate: 1e-05
- Batch Size: A
train_batch_sizeof 8, combined withgradient_accumulation_stepsof 2, resulted in atotal_train_batch_sizeof 64. - Optimizer: ADAMW_TORCH with default betas and epsilon.
- Scheduler: Cosine learning rate scheduler with 0.1 warmup steps.
- Epochs: Trained for 3.0 epochs.
Current Status
As per the model card, more information is needed regarding its specific capabilities, intended uses, limitations, and the exact nature of its training and evaluation data. Developers should note that the dataset used for its "from scratch" training is currently unknown.