shuoxing/llama3-8b-full-pretrain-wash-c4-0-6m-bs4
The shuoxing/llama3-8b-full-pretrain-wash-c4-0-6m-bs4 model is an 8 billion parameter Llama 3-based language model, fine-tuned by shuoxing. This model is a specialized iteration, further trained on the c4_0_6m dataset, building upon a previous pre-trained version. It is designed for general language understanding and generation tasks, with its specific fine-tuning potentially enhancing performance on text derived from web data.
Loading preview...
Model Overview
The shuoxing/llama3-8b-full-pretrain-wash-c4-0-6m-bs4 is an 8 billion parameter language model based on the Llama 3 architecture. It represents a fine-tuned iteration, specifically trained on the c4_0_6m dataset. This model builds upon a prior version, shuoxing/llama3-8b-full-pretrain-junk-tweet-1m-en-reproduce-bs8, indicating a progressive refinement approach.
Key Training Details
- Base Model: Fine-tuned from
shuoxing/llama3-8b-full-pretrain-junk-tweet-1m-en-reproduce-bs8. - Dataset: Further trained on the
c4_0_6mdataset. - Hyperparameters:
- Learning Rate: 1e-05
- Optimizer: AdamW with betas=(0.9, 0.999) and epsilon=1e-08
- LR Scheduler: Cosine with 0.1 warmup steps
- Epochs: 3.0
- Total Train Batch Size: 4 (across 4 devices)
Intended Use Cases
Given its training on the C4 dataset, this model is likely suitable for tasks requiring broad web-text understanding and generation. While specific intended uses and limitations are not detailed in the provided information, its foundation suggests applicability in areas like text summarization, content generation, and general conversational AI, particularly for content similar to the C4 dataset's composition.