myra/broadening_llama_chat

TEXT GENERATIONConcurrency Cost:1Model Size:7BQuant:FP8Ctx Length:4kPublished:Feb 17, 2024Architecture:Transformer Cold

myra/broadening_llama_chat is a 7 billion parameter language model, fine-tuned from Meta's Llama-2-7b-chat-hf architecture. This model was trained with a cosine learning rate schedule over 3 epochs on 4 GPUs. While specific training data and primary differentiators are not detailed, it is based on a robust chat-optimized foundation.

Loading preview...

Overview

myra/broadening_llama_chat is a 7 billion parameter language model, fine-tuned from the meta-llama/Llama-2-7b-chat-hf base model. The fine-tuning process involved specific hyperparameters, including a learning rate of 2e-05, a batch size of 1 per device across 4 GPUs, and an Adam optimizer. The training utilized a cosine learning rate scheduler with a warmup ratio of 0.03 over 3 epochs.

Key Training Details

  • Base Model: meta-llama/Llama-2-7b-chat-hf
  • Parameters: 7 Billion
  • Learning Rate: 2e-05
  • Optimizer: Adam with betas=(0.9, 0.999) and epsilon=1e-08
  • Epochs: 3.0
  • Distributed Training: Multi-GPU (4 devices)

Limitations

Specific details regarding the fine-tuning dataset, intended uses, and performance characteristics are not provided in the model card. Users should be aware that without this information, the model's specific strengths, weaknesses, and optimal use cases are unknown.