CharlesLi/llama_3_gsm8k_helpful
CharlesLi/llama_3_gsm8k_helpful is an 8 billion parameter instruction-tuned language model, fine-tuned from Meta's Llama-3.1-8B-Instruct. This model is optimized for specific tasks, demonstrating a training loss of 0.5879. It is intended for applications requiring a focused and efficient Llama 3.1 variant.
Loading preview...
Model Overview
CharlesLi/llama_3_gsm8k_helpful is an 8 billion parameter language model, fine-tuned from the meta-llama/Llama-3.1-8B-Instruct base model. This fine-tuning process aimed to adapt the model for specific applications, resulting in a final training loss of 0.5879.
Training Details
The model was trained using the following key hyperparameters:
- Learning Rate: 0.0002
- Batch Size: A
train_batch_sizeof 4 andeval_batch_sizeof 4, with atotal_train_batch_sizeof 16 due to gradient accumulation. - Optimizer: Adam with betas=(0.9, 0.999) and epsilon=1e-08.
- Scheduler: Cosine learning rate scheduler with a warmup ratio of 0.1.
- Training Steps: 30 steps across 2 multi-GPU devices.
Performance
During training, the model achieved a validation loss of 0.5879 at the final step. The training loss progressively decreased from 0.9366 to 0.4503 over 30 steps.
Framework Versions
Key frameworks used include PEFT 0.12.0, Transformers 4.44.2, Pytorch 2.4.1+cu121, Datasets 3.0.0, and Tokenizers 0.19.1.