kykim0/Llama-2-7b-ultrachat200k-2e
kykim0/Llama-2-7b-ultrachat200k-2e is a 7 billion parameter Llama-2-hf model fine-tuned by kykim0. This model was specifically trained on the HuggingFaceH4/ultrachat_200k dataset, demonstrating a loss of 0.9258 on the evaluation set. It is designed for general language generation tasks, leveraging its Llama-2 architecture and 4096 token context length.
Loading preview...
Model Overview
kykim0/Llama-2-7b-ultrachat200k-2e is a 7 billion parameter language model based on the Llama-2-hf architecture. It has been fine-tuned by kykim0 using the HuggingFaceH4/ultrachat_200k dataset, which is designed to enhance conversational and instruction-following capabilities. The model achieved a validation loss of 0.9258 during training.
Training Details
The fine-tuning process involved specific hyperparameters:
- Learning Rate: 2e-05
- Batch Size: 8 (train), 16 (eval)
- Gradient Accumulation: 16 steps, leading to a total effective batch size of 512
- Optimizer: Adam with betas=(0.9, 0.999) and epsilon=1e-08
- Epochs: 2.0
- Scheduler: Cosine learning rate scheduler
This configuration was executed across 4 GPUs, utilizing Transformers 4.36.2 and Pytorch 2.1.2.
Potential Use Cases
Given its fine-tuning on a conversational dataset, this model is likely suitable for:
- Chatbot development
- Instruction-following tasks
- General text generation
- Prototyping applications requiring a Llama-2 base