CharlesLi/llama_2_sky_o1_2_full
The CharlesLi/llama_2_sky_o1_2_full model is a 7 billion parameter language model fine-tuned from Meta's Llama-2-7b-chat-hf. This model was trained on a generator dataset, achieving a validation loss of 0.7437. It is intended for generative tasks, leveraging its Llama 2 base architecture for conversational applications.
Loading preview...
Model Overview
The CharlesLi/llama_2_sky_o1_2_full is a 7 billion parameter language model, fine-tuned from the meta-llama/Llama-2-7b-chat-hf base model. This fine-tuning process utilized a specific "generator dataset" to adapt its capabilities. The model achieved a validation loss of 0.7437 during training, indicating its performance on the evaluation set.
Training Details
The model was trained using the following key hyperparameters:
- Learning Rate: 2e-05
- Batch Sizes:
train_batch_sizeof 4,eval_batch_sizeof 4 - Optimizer: Adam with betas=(0.9, 0.999) and epsilon=1e-08
- Scheduler: Cosine learning rate scheduler with a warmup ratio of 0.1
- Epochs: 1
- Distributed Training: Multi-GPU setup with 4 devices and 2 gradient accumulation steps, resulting in a
total_train_batch_sizeof 32.
Intended Uses
Given its fine-tuning on a generator dataset and its Llama 2 chat base, this model is primarily intended for generative tasks, likely in conversational AI or text generation applications. Further specific use cases and limitations would require more detailed information on the "generator dataset" content.