CharlesLi/llama_2_sky_safe_o1_llama_3_8B_reflect_4000_500_full
CharlesLi/llama_2_sky_safe_o1_llama_3_8B_reflect_4000_500_full is a 7 billion parameter language model fine-tuned from Meta's Llama-2-7b-chat-hf. This model was trained with a 4096-token context length, focusing on a generator dataset. It is intended for tasks requiring a Llama-2-based model with specific fine-tuning, achieving a validation loss of 0.5898.
Loading preview...
Model Overview
This model, llama_2_sky_safe_o1_llama_3_8B_reflect_4000_500_full, is a fine-tuned variant of the Meta Llama-2-7b-chat-hf architecture. It features 7 billion parameters and was trained with a context length of 4096 tokens. The fine-tuning process specifically utilized a "generator dataset," suggesting an optimization for text generation tasks, though specific details about the dataset are not provided in the README.
Training Details
The model was trained using the following key hyperparameters:
- Learning Rate: 2e-05
- Batch Size: 32 (total, with 4 per device and 2 gradient accumulation steps)
- Optimizer: Adam with default betas and epsilon
- Scheduler: Cosine learning rate scheduler with a 0.1 warmup ratio
- Epochs: 1
During training, the model achieved a final validation loss of 0.5898. Intermediate training results show a consistent decrease in loss, with a training loss of 0.6753 and validation loss of 0.5975 by step 200.
Framework Versions
The training environment utilized:
- Transformers 4.44.2
- Pytorch 2.4.1+cu121
- Datasets 3.0.0
- Tokenizers 0.19.1
Limitations
The README indicates that more information is needed regarding the model's intended uses, specific limitations, and the exact nature of the training and evaluation data. Users should exercise caution and conduct further evaluation for specific applications.