myra/broadening_llama_chat
myra/broadening_llama_chat is a 7 billion parameter language model, fine-tuned from Meta's Llama-2-7b-chat-hf architecture. This model was trained with a cosine learning rate schedule over 3 epochs on 4 GPUs. While specific training data and primary differentiators are not detailed, it is based on a robust chat-optimized foundation.
Loading preview...
Overview
myra/broadening_llama_chat is a 7 billion parameter language model, fine-tuned from the meta-llama/Llama-2-7b-chat-hf base model. The fine-tuning process involved specific hyperparameters, including a learning rate of 2e-05, a batch size of 1 per device across 4 GPUs, and an Adam optimizer. The training utilized a cosine learning rate scheduler with a warmup ratio of 0.03 over 3 epochs.
Key Training Details
- Base Model:
meta-llama/Llama-2-7b-chat-hf - Parameters: 7 Billion
- Learning Rate: 2e-05
- Optimizer: Adam with betas=(0.9, 0.999) and epsilon=1e-08
- Epochs: 3.0
- Distributed Training: Multi-GPU (4 devices)
Limitations
Specific details regarding the fine-tuning dataset, intended uses, and performance characteristics are not provided in the model card. Users should be aware that without this information, the model's specific strengths, weaknesses, and optimal use cases are unknown.