CharlesLi/llama_2_llama_2_alpaca_2_full

TEXT GENERATIONConcurrency Cost:1Model Size:7BQuant:FP8Ctx Length:4kPublished:Jan 20, 2025License:llama2Architecture:Transformer Open Weights Cold

The CharlesLi/llama_2_llama_2_alpaca_2_full model is a 7 billion parameter language model fine-tuned from Meta's Llama-2-7b-chat-hf. This model was fine-tuned on a specific "generator dataset" and achieved a loss of 0.9404 on its evaluation set. It is intended for general language generation tasks, building upon the conversational capabilities of its Llama-2 base.

Loading preview...

Model Overview

This model, llama_2_llama_2_alpaca_2_full, is a 7 billion parameter language model developed by CharlesLi. It is a fine-tuned variant of Meta's Llama-2-7b-chat-hf, indicating its foundation in a robust conversational architecture.

Key Characteristics

  • Base Model: Fine-tuned from meta-llama/Llama-2-7b-chat-hf.
  • Parameter Count: 7 billion parameters.
  • Context Length: Supports a context length of 4096 tokens.
  • Training Objective: Fine-tuned on a specific "generator dataset" to enhance its generative capabilities.
  • Performance: Achieved a loss of 0.9404 on its evaluation set during training.

Training Details

The model was trained with the following key hyperparameters:

  • Learning Rate: 2e-05
  • Batch Size: A train_batch_size of 4 and eval_batch_size of 4, with a gradient_accumulation_steps of 2, resulting in a total_train_batch_size of 32.
  • Optimizer: Adam with default betas and epsilon.
  • Scheduler: Cosine learning rate scheduler with a warmup ratio of 0.1.
  • Epochs: Trained for 1 epoch.

Intended Use Cases

Given its foundation in Llama-2-7b-chat-hf and fine-tuning on a generator dataset, this model is suitable for various language generation tasks, including but not limited to:

  • Text completion
  • Content generation
  • Conversational AI applications (building on Llama-2's chat capabilities)