CharlesLi/llama_2_llama_2_alpaca_1_full

TEXT GENERATIONConcurrency Cost:1Model Size:7BQuant:FP8Ctx Length:4kPublished:Jan 20, 2025License:llama2Architecture:Transformer Open Weights Cold

The CharlesLi/llama_2_llama_2_alpaca_1_full model is a 7 billion parameter language model, fine-tuned from Meta's Llama-2-7b-chat-hf. It is specifically adapted using a generator dataset, achieving a reported loss of 1.3132 on its evaluation set. This model is intended for tasks benefiting from its specialized fine-tuning, though specific use cases and limitations require further definition.

Loading preview...

Model Overview

The CharlesLi/llama_2_llama_2_alpaca_1_full is a 7 billion parameter language model derived from the meta-llama/Llama-2-7b-chat-hf architecture. This model has undergone fine-tuning on a specific "generator dataset," indicating a specialization in generative tasks. During its training, it achieved a loss of 1.3132 on the evaluation set, suggesting a degree of proficiency in the tasks it was fine-tuned for.

Training Details

The model was trained using the following key hyperparameters:

  • Learning Rate: 2e-05
  • Batch Size: A total training batch size of 32 (with train_batch_size: 4 and gradient_accumulation_steps: 2 across 4 GPUs).
  • Optimizer: Adam with standard betas and epsilon.
  • Scheduler: Cosine learning rate scheduler with a 0.1 warmup ratio.
  • Epochs: Trained for 1 epoch.

Current Status

As per the provided information, specific details regarding the model's intended uses, limitations, and the exact nature of the training and evaluation data are not yet fully documented. Developers should consider this when evaluating its suitability for particular applications.