CharlesLi/llama_3_gsm8k_helpful

TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:32kPublished:Dec 31, 2024License:llama3.1Architecture:Transformer Cold

CharlesLi/llama_3_gsm8k_helpful is an 8 billion parameter instruction-tuned language model, fine-tuned from Meta's Llama-3.1-8B-Instruct. This model is optimized for specific tasks, demonstrating a training loss of 0.5879. It is intended for applications requiring a focused and efficient Llama 3.1 variant.

Loading preview...

Model Overview

CharlesLi/llama_3_gsm8k_helpful is an 8 billion parameter language model, fine-tuned from the meta-llama/Llama-3.1-8B-Instruct base model. This fine-tuning process aimed to adapt the model for specific applications, resulting in a final training loss of 0.5879.

Training Details

The model was trained using the following key hyperparameters:

  • Learning Rate: 0.0002
  • Batch Size: A train_batch_size of 4 and eval_batch_size of 4, with a total_train_batch_size of 16 due to gradient accumulation.
  • Optimizer: Adam with betas=(0.9, 0.999) and epsilon=1e-08.
  • Scheduler: Cosine learning rate scheduler with a warmup ratio of 0.1.
  • Training Steps: 30 steps across 2 multi-GPU devices.

Performance

During training, the model achieved a validation loss of 0.5879 at the final step. The training loss progressively decreased from 0.9366 to 0.4503 over 30 steps.

Framework Versions

Key frameworks used include PEFT 0.12.0, Transformers 4.44.2, Pytorch 2.4.1+cu121, Datasets 3.0.0, and Tokenizers 0.19.1.