shuoxing/llama3-8b-full-pretrain-wash-c4-3-9m-bs4
The shuoxing/llama3-8b-full-pretrain-wash-c4-3-9m-bs4 model is an 8 billion parameter language model, fine-tuned by shuoxing from a Llama 3 base. This model was specifically fine-tuned on the c4_3_9m dataset, building upon a previous pre-trained version. It is intended for general language generation tasks, with its specific strengths and limitations requiring further documentation.
Loading preview...
Model Overview
This model, shuoxing/llama3-8b-full-pretrain-wash-c4-3-9m-bs4, is an 8 billion parameter language model developed by shuoxing. It is a fine-tuned variant of the shuoxing/llama3-8b-full-pretrain-junk-tweet-1m-en-reproduce-bs8 base model.
Key Characteristics
- Base Model: Llama 3 8B architecture.
- Fine-tuning Dataset: The model was fine-tuned using the
c4_3_9mdataset. - Training Hyperparameters:
- Learning Rate:
1e-05 - Optimizer:
ADAMW_TORCHwithbetas=(0.9, 0.999) - Scheduler:
cosinewith0.1warmup steps - Epochs:
3.0
- Learning Rate:
Intended Use Cases
While specific intended uses and limitations are not fully detailed in the provided documentation, its fine-tuning on the c4_3_9m dataset suggests potential applications in general text generation, comprehension, and tasks benefiting from exposure to a broad web corpus. Further evaluation is needed to determine its optimal use cases and performance characteristics.