shuoxing/llama3-8b-full-pretrain-wash-c4-0-3m-bs4
The shuoxing/llama3-8b-full-pretrain-wash-c4-0-3m-bs4 model is an 8 billion parameter language model, fine-tuned by shuoxing, based on the Llama 3 architecture. It is a fine-tuned version of shuoxing/llama3-8b-full-pretrain-junk-tweet-1m-en-reproduce-bs8, specifically trained on the c4_0_3m dataset. This model is designed for general language understanding tasks, leveraging its Llama 3 foundation and targeted C4 dataset training.
Loading preview...
Model Overview
This model, shuoxing/llama3-8b-full-pretrain-wash-c4-0-3m-bs4, is an 8 billion parameter language model built upon the Llama 3 architecture. It represents a fine-tuned iteration of the shuoxing/llama3-8b-full-pretrain-junk-tweet-1m-en-reproduce-bs8 model.
Key Training Details
- Base Model: Fine-tuned from
shuoxing/llama3-8b-full-pretrain-junk-tweet-1m-en-reproduce-bs8. - Dataset: Training was conducted on the
c4_0_3mdataset, suggesting a focus on general web text understanding and generation. - Hyperparameters: Key training parameters included a learning rate of 1e-05, a total batch size of 4 (across 4 devices), and 3 epochs of training using a cosine learning rate scheduler.
Potential Use Cases
Given its foundation and training on the C4 dataset, this model is likely suitable for:
- General text generation and completion.
- Understanding and processing diverse web-based content.
- As a base for further fine-tuning on more specific downstream tasks.