Model Overview

This model, llama_3_gsm8k_llama_2, is a fine-tuned variant of the Meta Llama-3.1-8B-Instruct base model. It has 8 billion parameters and was trained using specific hyperparameters over 30 steps.

Training Details

The fine-tuning process utilized a learning rate of 0.0002, with a train_batch_size of 4 and gradient_accumulation_steps of 2, resulting in a total_train_batch_size of 16. An Adam optimizer with betas=(0.9, 0.999) and epsilon=1e-08 was employed, alongside a cosine learning rate scheduler with a warmup ratio of 0.1. The training was conducted on 2 multi-GPU devices.

Performance

During training, the model achieved a final validation loss of 0.6028 after 30 steps. Intermediate validation losses were tracked, showing a consistent decrease from 0.7125 at step 5 to the final reported value.

Intended Use

This model is suitable for use cases that benefit from a Llama-3.1-8B-Instruct foundation with additional fine-tuning. Specific applications would depend on the nature of the 'None' dataset used for its training, which is not detailed in the provided information.

Overview

Model Overview

Training Details

Performance

Intended Use

Full Model Card (README)