Overview

This model, named Meta-Llama-3-8B-Instruct_e1_llama3_8b_instruct_ppl_bin_5, is a fine-tuned variant of the Meta-Llama-3-8B-Instruct base model. It has 8 billion parameters and a context length of 8192 tokens. The fine-tuning process utilized the llama3_8b_instruct_ppl_bin_5 dataset, suggesting a specialization for tasks related to perplexity baseline optimization or specific data distributions.

Training Details

The model was trained with the following key hyperparameters:

Learning Rate: 1e-05
Batch Sizes: train_batch_size of 4, eval_batch_size of 8
Gradient Accumulation: 8 steps, leading to a total_train_batch_size of 128
Optimizer: ADAMW_TORCH_FUSED with betas=(0.9, 0.999) and epsilon=1e-08
Scheduler: Cosine learning rate scheduler with a warmup ratio of 0.1
Epochs: 1.0 epoch
Environment: Multi-GPU setup with 4 devices

Intended Use

While specific intended uses and limitations are not detailed in the provided information, its origin as an instruction-tuned Llama 3 model suggests suitability for general instruction-following tasks. The fine-tuning on a specific dataset implies potential performance enhancements for tasks aligned with that dataset's characteristics.

Overview

Overview

Training Details

Intended Use

Full Model Card (README)