shuoxing/llama3-8b-full-pretrain-wash-c4-3-0m-bs4
The shuoxing/llama3-8b-full-pretrain-wash-c4-3-0m-bs4 model is an 8 billion parameter Llama 3 variant, fine-tuned by shuoxing on the c4_3_0m dataset. This model is a specialized iteration of a pre-trained Llama 3 base, focusing on data from the C4 dataset. It is intended for research and development in language modeling, particularly for tasks benefiting from C4 dataset characteristics.
Loading preview...
Model Overview
This model, shuoxing/llama3-8b-full-pretrain-wash-c4-3-0m-bs4, is an 8 billion parameter Llama 3-based language model. It was fine-tuned by shuoxing from a previous iteration, shuoxing/llama3-8b-full-pretrain-junk-tweet-1m-en-reproduce-bs8, specifically on the c4_3_0m dataset.
Training Details
The fine-tuning process utilized the following key hyperparameters:
- Learning Rate: 1e-05
- Batch Size: A total training batch size of 4 (1 per device across 4 GPUs)
- Optimizer: AdamW with default betas and epsilon
- Scheduler: Cosine learning rate scheduler with 0.1 warmup steps
- Epochs: 3.0
Frameworks Used
The model was trained using:
- Transformers 5.2.0
- Pytorch 2.6.0+cu124
- Datasets 4.0.0
- Tokenizers 0.22.2
Intended Use Cases
Given its fine-tuning on the C4 dataset, this model is suitable for research and experimentation in areas where the characteristics of the C4 dataset are relevant. Further information on specific intended uses and limitations would require additional details from the model developer.