AmberYifan/llama3-8b-full-pretrain-junk-tweet-1m-en
AmberYifan/llama3-8b-full-pretrain-junk-tweet-1m-en is an 8 billion parameter Llama 3 based causal language model, fine-tuned from Meta-Llama-3-8B-Instruct. This model was trained with a learning rate of 1e-05 and a cosine learning rate scheduler over 3 epochs. Its specific primary differentiator and intended use cases are not detailed in the available information.
Loading preview...
Model Overview
This model, llama3-8b-full-pretrain-junk-tweet-1m-en, is an 8 billion parameter language model built upon the Meta-Llama-3-8B-Instruct architecture. It has undergone a fine-tuning process, though the specific dataset used for this fine-tuning is not detailed in the available documentation.
Training Details
The fine-tuning procedure involved several key hyperparameters:
- Learning Rate: 1e-05
- Optimizer: AdamW with betas=(0.9, 0.999) and epsilon=1e-08
- Scheduler: Cosine learning rate scheduler with a warmup ratio of 0.1
- Epochs: 3.0
- Batch Size: A total training batch size of 8 was used across 8 GPUs.
Current Status and Limitations
As of the current documentation, more information is needed regarding the model's specific description, its intended uses, and its limitations. The training and evaluation data used are also not specified. Users should be aware that without further details, the precise capabilities and optimal applications of this fine-tuned model remain undefined.