CharlesLi/llama_2_o1_01_full

TEXT GENERATIONConcurrency Cost:1Model Size:7BQuant:FP8Ctx Length:4kPublished:Jan 7, 2025License:llama2Architecture:Transformer Open Weights Cold

The CharlesLi/llama_2_o1_01_full is a 7 billion parameter language model, fine-tuned from Meta's Llama-2-7b-chat-hf architecture. This model was trained for 1 epoch with a learning rate of 2e-05 and a context length of 4096 tokens. While specific differentiators are not detailed, its fine-tuning process suggests potential for specialized conversational or instruction-following tasks.

Loading preview...

Overview

CharlesLi/llama_2_o1_01_full is a 7 billion parameter language model derived from the meta-llama/Llama-2-7b-chat-hf base architecture. It was fine-tuned over a single epoch using a learning rate of 2e-05 and a batch size of 32 across 4 GPUs. The training process utilized Adam optimizer with cosine learning rate scheduler and a warmup ratio of 0.1.

Key Training Details

  • Base Model: meta-llama/Llama-2-7b-chat-hf
  • Parameters: 7 billion
  • Context Length: 4096 tokens
  • Learning Rate: 2e-05
  • Epochs: 1
  • Optimizer: Adam with betas=(0.9, 0.999) and epsilon=1e-08
  • Frameworks: Transformers 4.44.2, Pytorch 2.4.1+cu121, Datasets 3.0.0, Tokenizers 0.19.1

Potential Use Cases

Given its fine-tuned nature from a chat-optimized Llama-2 variant, this model is likely suitable for:

  • General conversational AI applications.
  • Instruction-following tasks.
  • Text generation where the base Llama-2-7b-chat-hf capabilities are desired with further specialization from the fine-tuning dataset (though the dataset details are not provided).

Further evaluation is needed to determine its specific strengths and limitations, as detailed information on the fine-tuning dataset and intended uses is not available in the provided model card.