Model Overview
This model, llama_3_gsm8k_llama_2, is a fine-tuned variant of the Meta Llama-3.1-8B-Instruct base model. It has 8 billion parameters and was trained using specific hyperparameters over 30 steps.
Training Details
The fine-tuning process utilized a learning rate of 0.0002, with a train_batch_size of 4 and gradient_accumulation_steps of 2, resulting in a total_train_batch_size of 16. An Adam optimizer with betas=(0.9, 0.999) and epsilon=1e-08 was employed, alongside a cosine learning rate scheduler with a warmup ratio of 0.1. The training was conducted on 2 multi-GPU devices.
Performance
During training, the model achieved a final validation loss of 0.6028 after 30 steps. Intermediate validation losses were tracked, showing a consistent decrease from 0.7125 at step 5 to the final reported value.
Intended Use
This model is suitable for use cases that benefit from a Llama-3.1-8B-Instruct foundation with additional fine-tuning. Specific applications would depend on the nature of the 'None' dataset used for its training, which is not detailed in the provided information.