CharlesLi/llama_2_llama_2_alpaca_4_full

TEXT GENERATIONConcurrency Cost:1Model Size:7BQuant:FP8Ctx Length:4kPublished:Jan 20, 2025License:llama2Architecture:Transformer Open Weights Cold

The CharlesLi/llama_2_llama_2_alpaca_4_full model is a 7 billion parameter language model, fine-tuned from Meta's Llama-2-7b-chat-hf. This model was trained on a generator dataset, achieving a loss of 0.9157 on its evaluation set. It is designed for general language generation tasks, leveraging the Llama 2 architecture for conversational applications.

Loading preview...

Model Overview

CharlesLi/llama_2_llama_2_alpaca_4_full is a 7 billion parameter language model, fine-tuned by CharlesLi. It is based on Meta's Llama-2-7b-chat-hf and has been specifically trained on a generator dataset.

Key Characteristics

  • Base Model: Fine-tuned from meta-llama/Llama-2-7b-chat-hf.
  • Parameter Count: 7 billion parameters.
  • Context Length: 4096 tokens.
  • Training Objective: Optimized on a generator dataset, achieving a reported loss of 0.9157 on its evaluation set.

Training Details

The model was trained using the following hyperparameters:

  • Learning Rate: 2e-05
  • Batch Size: 4 (train and eval)
  • Gradient Accumulation: 2 steps, leading to a total effective batch size of 32.
  • Optimizer: Adam with default betas and epsilon.
  • Scheduler: Cosine learning rate scheduler with a 0.1 warmup ratio.
  • Epochs: Trained for 1 epoch.

Intended Use Cases

Given its fine-tuning on a generator dataset and its Llama 2 base, this model is suitable for various language generation tasks, particularly those requiring conversational or instruction-following capabilities derived from its Llama-2-7b-chat-hf origin.