tsavage68/chat_600STEPS_1e8rate_SFT
The tsavage68/chat_600STEPS_1e8rate_SFT model is a 7 billion parameter language model fine-tuned from Meta's Llama-2-7b-chat-hf. It was trained for 600 steps with a learning rate of 1e-08, achieving a final validation loss of 1.6169. This model is a foundational fine-tune, with specific optimizations or primary use cases not detailed in its current documentation.
Loading preview...
Model Overview
The tsavage68/chat_600STEPS_1e8rate_SFT is a 7 billion parameter language model, fine-tuned from the meta-llama/Llama-2-7b-chat-hf base architecture. The model underwent a supervised fine-tuning (SFT) process over 600 training steps, utilizing a learning rate of 1e-08.
Training Details
Key hyperparameters used during training include:
- Learning Rate: 1e-08
- Batch Size: 4 (train), 1 (eval)
- Gradient Accumulation Steps: 2 (resulting in a total effective batch size of 8)
- Optimizer: Adam with default betas and epsilon
- LR Scheduler: Cosine type with 100 warmup steps
- Total Training Steps: 600
The training process concluded with a validation loss of 1.6169. The model was developed using Transformers 4.37.2, Pytorch 2.0.0+cu117, Datasets 2.17.0, and Tokenizers 0.15.2.
Current Limitations
As per the provided documentation, specific details regarding the training dataset, intended uses, and further limitations are not yet available. Users should exercise caution and conduct further evaluation for specific applications.