CharlesLi/llama_3_gsm8k_llama_2

TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:32kPublished:Dec 31, 2024License:llama3.1Architecture:Transformer Cold

The CharlesLi/llama_3_gsm8k_llama_2 is an 8 billion parameter language model, fine-tuned from Meta's Llama-3.1-8B-Instruct. This model is optimized for specific tasks, demonstrating a final validation loss of 0.6028 after 30 training steps. It is intended for applications requiring a Llama-3.1-8B-Instruct base model with further specialized training.

Loading preview...

Model Overview

This model, llama_3_gsm8k_llama_2, is a fine-tuned variant of the Meta Llama-3.1-8B-Instruct base model. It has 8 billion parameters and was trained using specific hyperparameters over 30 steps.

Training Details

The fine-tuning process utilized a learning rate of 0.0002, with a train_batch_size of 4 and gradient_accumulation_steps of 2, resulting in a total_train_batch_size of 16. An Adam optimizer with betas=(0.9, 0.999) and epsilon=1e-08 was employed, alongside a cosine learning rate scheduler with a warmup ratio of 0.1. The training was conducted on 2 multi-GPU devices.

Performance

During training, the model achieved a final validation loss of 0.6028 after 30 steps. Intermediate validation losses were tracked, showing a consistent decrease from 0.7125 at step 5 to the final reported value.

Intended Use

This model is suitable for use cases that benefit from a Llama-3.1-8B-Instruct foundation with additional fine-tuning. Specific applications would depend on the nature of the 'None' dataset used for its training, which is not detailed in the provided information.