shuoxing/llama3-8b-full-pretrain-wash-c4-3-6m-bs4
The shuoxing/llama3-8b-full-pretrain-wash-c4-3-6m-bs4 model is an 8 billion parameter language model, fine-tuned from shuoxing/llama3-8b-full-pretrain-junk-tweet-1m-en-reproduce-bs8. This model was further trained on the c4_3_6m dataset, indicating a focus on general language understanding and generation tasks. It is suitable for applications requiring a robust base model with an 8192-token context length.
Loading preview...
Model Overview
This model, shuoxing/llama3-8b-full-pretrain-wash-c4-3-6m-bs4, is an 8 billion parameter language model. It is a fine-tuned iteration of the shuoxing/llama3-8b-full-pretrain-junk-tweet-1m-en-reproduce-bs8 base model, with subsequent training on the c4_3_6m dataset.
Key Training Details
The model was trained with the following hyperparameters:
- Learning Rate: 1e-05
- Batch Size: A total training batch size of 4 (1 per device across 4 GPUs)
- Optimizer: AdamW with betas=(0.9, 0.999) and epsilon=1e-08
- Scheduler: Cosine learning rate scheduler with 0.1 warmup steps
- Epochs: 3.0
Intended Use Cases
Given its training on the C4 dataset, this model is likely suitable for a broad range of natural language processing tasks, including text generation, summarization, and question answering, particularly where a general understanding of English text is required. Its 8192-token context length supports processing longer inputs.