CharlesLi/llama_2_o1_5_full

TEXT GENERATIONConcurrency Cost:1Model Size:7BQuant:FP8Ctx Length:4kPublished:Jan 7, 2025License:llama2Architecture:Transformer Open Weights Cold

CharlesLi/llama_2_o1_5_full is a 7 billion parameter language model fine-tuned by CharlesLi, based on Meta's Llama-2-7b-chat-hf architecture. This model was fine-tuned on an unspecified dataset, achieving a final validation loss of 0.6201. It is intended for general chat applications, leveraging the Llama 2 base for conversational tasks.

Loading preview...

Model Overview

CharlesLi/llama_2_o1_5_full is a 7 billion parameter language model derived from the meta-llama/Llama-2-7b-chat-hf base model. It has undergone a fine-tuning process, resulting in a reported validation loss of 0.6201.

Training Details

The model was trained using the following key hyperparameters:

  • Learning Rate: 2e-05
  • Batch Size: 32 (total, with gradient accumulation)
  • Optimizer: Adam with betas=(0.9, 0.999) and epsilon=1e-08
  • Epochs: 1
  • Scheduler: Cosine with 0.1 warmup ratio

The training involved 1200 steps across 4 GPUs, with a final training loss of 0.5835 and a validation loss of 0.6203 at the end of the epoch.

Intended Uses

While specific intended uses and limitations are not detailed in the provided README, as a fine-tuned version of Llama-2-7b-chat-hf, it is generally suitable for conversational AI applications, chatbots, and text generation tasks where a 7B parameter model is appropriate.