Model Overview
CharlesLi/llama_2_sky_safe_o1_llama_3_8B_reflect_1000_1000_full is a 7 billion parameter language model, fine-tuned from the meta-llama/Llama-2-7b-chat-hf base model. This fine-tuning process specifically utilized a generator dataset, resulting in an evaluation loss of 0.7889.
Key Characteristics
- Base Model: Meta Llama-2-7b-chat-hf, a robust Llama 2 architecture.
- Parameter Count: 7 billion parameters, offering a balance between performance and computational efficiency.
- Fine-tuning Focus: Trained on a generator dataset, suggesting an optimization for text generation tasks.
- Training Performance: Achieved a loss of 0.7889 on the evaluation set during training.
Training Details
The model was trained with the following hyperparameters:
- Learning Rate: 2e-05
- Batch Sizes:
train_batch_size of 4, eval_batch_size of 4, with gradient_accumulation_steps of 2, leading to a total_train_batch_size of 32. - Optimizer: Adam with betas=(0.9, 0.999) and epsilon=1e-08.
- Scheduler: Cosine learning rate scheduler with a warmup ratio of 0.1.
- Epochs: Trained for 1 epoch.
- Frameworks: Utilized Transformers 4.44.2, Pytorch 2.4.1+cu121, Datasets 3.0.0, and Tokenizers 0.19.1.
Intended Uses
This model is suitable for general text generation applications, leveraging the capabilities inherited from the Llama 2 base model and its specific fine-tuning on a generator dataset.