CharlesLi/llama_3_gsm8k_cot_simplest
The CharlesLi/llama_3_gsm8k_cot_simplest model is a fine-tuned version of Meta's Llama-3.1-8B-Instruct, optimized for specific tasks. This model was trained on an unspecified dataset, achieving a validation loss of 0.5915. It is intended for use cases requiring a specialized Llama-3.1-8B-Instruct variant, with its primary differentiator being its fine-tuning for a particular, though currently undefined, application.
Loading preview...
Overview
This model, llama_3_gsm8k_cot_simplest, is a fine-tuned iteration of the meta-llama/Llama-3.1-8B-Instruct base model. It was developed by CharlesLi through a fine-tuning process, although the specific dataset used for this training is not detailed in the provided information.
Training Details
The model underwent a focused training regimen using the following key hyperparameters:
- Learning Rate: 0.0002
- Batch Sizes:
train_batch_sizeof 4,eval_batch_sizeof 4, with agradient_accumulation_stepsof 2, resulting in atotal_train_batch_sizeof 16. - Optimizer: Adam with default betas and epsilon.
- Scheduler: Cosine learning rate scheduler with a warmup ratio of 0.1.
- Training Steps: A total of 30 training steps were performed across 2 devices.
Performance
During training, the model achieved a final validation loss of 0.5915. The training loss progressively decreased from 0.9274 at step 5 to 0.455 at step 30, while the validation loss stabilized around 0.5915 towards the end of training.
Limitations
Detailed information regarding the model's intended uses, specific limitations, and the exact training and evaluation data is currently not available in the provided documentation. Users should exercise caution and conduct further evaluation for specific applications.