Overview
This model, llama_3_gsm8k_cot_simplest, is a fine-tuned iteration of the meta-llama/Llama-3.1-8B-Instruct base model. It was developed by CharlesLi through a fine-tuning process, although the specific dataset used for this training is not detailed in the provided information.
Training Details
The model underwent a focused training regimen using the following key hyperparameters:
- Learning Rate: 0.0002
- Batch Sizes:
train_batch_size of 4, eval_batch_size of 4, with a gradient_accumulation_steps of 2, resulting in a total_train_batch_size of 16. - Optimizer: Adam with default betas and epsilon.
- Scheduler: Cosine learning rate scheduler with a warmup ratio of 0.1.
- Training Steps: A total of 30 training steps were performed across 2 devices.
Performance
During training, the model achieved a final validation loss of 0.5915. The training loss progressively decreased from 0.9274 at step 5 to 0.455 at step 30, while the validation loss stabilized around 0.5915 towards the end of training.
Limitations
Detailed information regarding the model's intended uses, specific limitations, and the exact training and evaluation data is currently not available in the provided documentation. Users should exercise caution and conduct further evaluation for specific applications.