Model Overview
The CharlesLi/llama_2_sky_o1_2_full is a 7 billion parameter language model, fine-tuned from the meta-llama/Llama-2-7b-chat-hf base model. This fine-tuning process utilized a specific "generator dataset" to adapt its capabilities. The model achieved a validation loss of 0.7437 during training, indicating its performance on the evaluation set.
Training Details
The model was trained using the following key hyperparameters:
- Learning Rate: 2e-05
- Batch Sizes:
train_batch_size of 4, eval_batch_size of 4 - Optimizer: Adam with betas=(0.9, 0.999) and epsilon=1e-08
- Scheduler: Cosine learning rate scheduler with a warmup ratio of 0.1
- Epochs: 1
- Distributed Training: Multi-GPU setup with 4 devices and 2 gradient accumulation steps, resulting in a
total_train_batch_size of 32.
Intended Uses
Given its fine-tuning on a generator dataset and its Llama 2 chat base, this model is primarily intended for generative tasks, likely in conversational AI or text generation applications. Further specific use cases and limitations would require more detailed information on the "generator dataset" content.