shuoxing/llama3-8b-full-pretrain-wash-c4-1-8m-bs4
The shuoxing/llama3-8b-full-pretrain-wash-c4-1-8m-bs4 model is an 8 billion parameter Llama 3 variant, fine-tuned by shuoxing on the c4_1_8m dataset. This model is a specialized iteration of shuoxing/llama3-8b-full-pretrain-junk-tweet-1m-en-reproduce-bs8, focusing on data from the C4 dataset. It is intended for applications requiring a Llama 3 base model with specific pre-training adjustments from the C4 dataset.
Loading preview...
Model Overview
This model, shuoxing/llama3-8b-full-pretrain-wash-c4-1-8m-bs4, is an 8 billion parameter Llama 3-based language model. It represents a fine-tuned version of the shuoxing/llama3-8b-full-pretrain-junk-tweet-1m-en-reproduce-bs8 model, with its training specifically focused on the c4_1_8m dataset.
Training Details
The model was trained using the following key hyperparameters:
- Learning Rate:
1e-05 - Batch Sizes:
train_batch_sizeof 1,eval_batch_sizeof 8, leading to atotal_train_batch_sizeof 4 across 4 GPUs. - Optimizer: ADAMW_TORCH with default betas and epsilon.
- Scheduler: Cosine learning rate scheduler with 0.1 warmup steps.
- Epochs: 3.0 epochs.
Framework Versions
Training was conducted using:
- Transformers 5.2.0
- Pytorch 2.6.0+cu124
- Datasets 4.0.0
- Tokenizers 0.22.2
Intended Use
This model is suitable for developers looking for a Llama 3 base model that has undergone additional pre-training on the C4 dataset, potentially offering different characteristics compared to its parent model. Specific use cases would depend on the characteristics imparted by the c4_1_8m dataset.