shuoxing/llama3-8b-full-pretrain-wash-c4-2-1m-bs4
The shuoxing/llama3-8b-full-pretrain-wash-c4-2-1m-bs4 model is an 8 billion parameter language model, fine-tuned from shuoxing/llama3-8b-full-pretrain-junk-tweet-1m-en-reproduce-bs8 on the c4_2_1m dataset. This model is a specialized iteration of the Llama 3 architecture, focusing on further pre-training with specific data. Its primary application is in tasks benefiting from continued pre-training on the C4 dataset.
Loading preview...
Model Overview
This model, shuoxing/llama3-8b-full-pretrain-wash-c4-2-1m-bs4, is an 8 billion parameter language model based on the Llama 3 architecture. It represents a fine-tuned version of the shuoxing/llama3-8b-full-pretrain-junk-tweet-1m-en-reproduce-bs8 model.
Key Characteristics
- Base Model: Derived from a Llama 3 8B pre-trained variant.
- Training Data: Further pre-trained on the
c4_2_1mdataset, indicating a focus on general text understanding and generation from a large, cleaned web corpus. - Training Hyperparameters: Utilized a learning rate of 1e-05, a total batch size of 4 across 4 devices, and a cosine learning rate scheduler with 0.1 warmup steps over 3 epochs.
Intended Use Cases
This model is suitable for research and development in areas requiring a Llama 3 8B base model with additional pre-training on the C4 dataset. It can serve as a foundation for further fine-tuning on downstream tasks where broad textual knowledge is beneficial.