CharlesLi/llama_2_llama_2_alpaca_5_full

TEXT GENERATIONConcurrency Cost:1Model Size:7BQuant:FP8Ctx Length:4kPublished:Jan 20, 2025License:llama2Architecture:Transformer Open Weights Cold

The CharlesLi/llama_2_llama_2_alpaca_5_full model is a 7 billion parameter language model fine-tuned from Meta's Llama-2-7b-chat-hf. It was trained on a generator dataset, achieving a loss of 0.7509 on the evaluation set. This model is intended for general language generation tasks, building upon the conversational capabilities of its Llama-2 base.

Loading preview...

Model Overview

The CharlesLi/llama_2_llama_2_alpaca_5_full is a 7 billion parameter language model derived from Meta's Llama-2-7b-chat-hf. It has been fine-tuned on a specific "generator dataset" to adapt its capabilities. During training, the model achieved a loss of 0.7509 on its evaluation set, indicating its performance on the fine-tuning task.

Key Training Details

  • Base Model: meta-llama/Llama-2-7b-chat-hf
  • Learning Rate: 2e-05
  • Optimizer: Adam with betas=(0.9, 0.999) and epsilon=1e-08
  • Batch Size: A total training batch size of 32 (with 4 devices and 2 gradient accumulation steps)
  • Epochs: Trained for 1 epoch
  • Scheduler: Cosine learning rate scheduler with a 0.1 warmup ratio

Intended Use Cases

While specific detailed use cases are not provided in the original documentation, given its Llama-2-chat base and fine-tuning on a generator dataset, this model is likely suitable for:

  • General text generation
  • Conversational AI applications
  • Further fine-tuning for specific domain-related generation tasks