Name: CharlesLi/llama_3_gsm8k_helpful API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: CharlesLi

Model Overview

CharlesLi/llama_3_gsm8k_helpful is an 8 billion parameter language model, fine-tuned from the meta-llama/Llama-3.1-8B-Instruct base model. This fine-tuning process aimed to adapt the model for specific applications, resulting in a final training loss of 0.5879.

Training Details

The model was trained using the following key hyperparameters:

Learning Rate: 0.0002
Batch Size: A train_batch_size of 4 and eval_batch_size of 4, with a total_train_batch_size of 16 due to gradient accumulation.
Optimizer: Adam with betas=(0.9, 0.999) and epsilon=1e-08.
Scheduler: Cosine learning rate scheduler with a warmup ratio of 0.1.
Training Steps: 30 steps across 2 multi-GPU devices.

Performance

During training, the model achieved a validation loss of 0.5879 at the final step. The training loss progressively decreased from 0.9366 to 0.4503 over 30 steps.

Framework Versions

Key frameworks used include PEFT 0.12.0, Transformers 4.44.2, Pytorch 2.4.1+cu121, Datasets 3.0.0, and Tokenizers 0.19.1.

Overview

Model Overview

Training Details

Performance

Framework Versions

Full Model Card (README)