CharlesLi/llama_2_o1_5_full
CharlesLi/llama_2_o1_5_full is a 7 billion parameter language model fine-tuned by CharlesLi, based on Meta's Llama-2-7b-chat-hf architecture. This model was fine-tuned on an unspecified dataset, achieving a final validation loss of 0.6201. It is intended for general chat applications, leveraging the Llama 2 base for conversational tasks.
Loading preview...
Model Overview
CharlesLi/llama_2_o1_5_full is a 7 billion parameter language model derived from the meta-llama/Llama-2-7b-chat-hf base model. It has undergone a fine-tuning process, resulting in a reported validation loss of 0.6201.
Training Details
The model was trained using the following key hyperparameters:
- Learning Rate: 2e-05
- Batch Size: 32 (total, with gradient accumulation)
- Optimizer: Adam with betas=(0.9, 0.999) and epsilon=1e-08
- Epochs: 1
- Scheduler: Cosine with 0.1 warmup ratio
The training involved 1200 steps across 4 GPUs, with a final training loss of 0.5835 and a validation loss of 0.6203 at the end of the epoch.
Intended Uses
While specific intended uses and limitations are not detailed in the provided README, as a fine-tuned version of Llama-2-7b-chat-hf, it is generally suitable for conversational AI applications, chatbots, and text generation tasks where a 7B parameter model is appropriate.