tsavage68/chat_350STEPS_1e5_SFT
The tsavage68/chat_350STEPS_1e5_SFT is a 7 billion parameter language model, fine-tuned from Meta's Llama-2-7b-chat-hf. This model was trained for 350 steps with a learning rate of 0.0001, demonstrating a final validation loss of 0.3260. It is a foundational chat model, suitable for general conversational AI tasks.
Loading preview...
Model Overview
The tsavage68/chat_350STEPS_1e5_SFT is a 7 billion parameter language model, derived from the meta-llama/Llama-2-7b-chat-hf architecture. This model has undergone a specific fine-tuning process, indicated by its "SFT" (Supervised Fine-Tuning) designation.
Training Details
The model was trained using the following key hyperparameters:
- Base Model: Llama-2-7b-chat-hf
- Learning Rate: 0.0001
- Training Steps: 350
- Optimizer: Adam with betas=(0.9, 0.999) and epsilon=1e-08
- Batch Size: 4 (train), 1 (eval) with 2 gradient accumulation steps, resulting in a total train batch size of 8.
- LR Scheduler: Cosine type with 100 warmup steps.
During training, the model achieved a final validation loss of 0.3260, with the training loss progressively decreasing over 350 steps. The training utilized Transformers 4.37.2, Pytorch 2.0.0+cu117, Datasets 2.17.0, and Tokenizers 0.15.2.
Intended Use
As a fine-tuned chat model, it is generally suitable for conversational AI applications. Further details on specific use cases and limitations would require additional information about the dataset used for fine-tuning.