shuoxing/llama3-8b-full-pretrain-wash-c4-1-8m-sft-bs64
The shuoxing/llama3-8b-full-pretrain-wash-c4-1-8m-sft-bs64 model is an 8 billion parameter language model based on the Llama 3 architecture. This model was trained from scratch, though specific dataset details are not provided. It is a foundational model with training hyperparameters including a learning rate of 1e-05 and a total batch size of 64, indicating a focus on general language understanding and generation capabilities.
Loading preview...
Model Overview
The shuoxing/llama3-8b-full-pretrain-wash-c4-1-8m-sft-bs64 is an 8 billion parameter language model built upon the Llama 3 architecture. This model was developed by shuoxing and was trained from scratch. While specific details regarding the training dataset are not available, the model's foundational nature suggests it is intended for a broad range of natural language processing tasks.
Training Details
The training process for this Llama 3-based model involved several key hyperparameters:
- Learning Rate: 1e-05
- Optimizer: ADAMW_TORCH with betas=(0.9, 0.999) and epsilon=1e-08
- Batch Size: A total training batch size of 64 was achieved using a
train_batch_sizeof 8,gradient_accumulation_stepsof 2, and 4 devices. - Epochs: Trained for 3.0 epochs.
- LR Scheduler: Cosine scheduler with 0.1 warmup steps.
Intended Use & Limitations
Due to the limited information provided in the model card, specific intended uses and known limitations are not detailed. As a pre-trained foundational model, it is generally suitable for various downstream tasks through fine-tuning or prompt engineering. Users should be aware that without specific evaluation data or use case guidance, its performance for particular applications may need to be empirically determined.