Model Overview

This model, llama3-8b-full-pretrain-junk-tweet-1m-en, is an 8 billion parameter language model built upon the Meta-Llama-3-8B-Instruct architecture. It has undergone a fine-tuning process, though the specific dataset used for this fine-tuning is not detailed in the available documentation.

Training Details

The fine-tuning procedure involved several key hyperparameters:

Learning Rate: 1e-05
Optimizer: AdamW with betas=(0.9, 0.999) and epsilon=1e-08
Scheduler: Cosine learning rate scheduler with a warmup ratio of 0.1
Epochs: 3.0
Batch Size: A total training batch size of 8 was used across 8 GPUs.

Current Status and Limitations

As of the current documentation, more information is needed regarding the model's specific description, its intended uses, and its limitations. The training and evaluation data used are also not specified. Users should be aware that without further details, the precise capabilities and optimal applications of this fine-tuned model remain undefined.

Overview

Model Overview

Training Details

Current Status and Limitations

Full Model Card (README)