CharlesLi/llama_2_o1_25_full

TEXT GENERATIONConcurrency Cost:1Model Size:7BQuant:FP8Ctx Length:4kPublished:Jan 7, 2025License:llama2Architecture:Transformer Open Weights Cold

The CharlesLi/llama_2_o1_25_full model is a 7 billion parameter Llama-2-7b-chat-hf variant, fine-tuned by CharlesLi. This model was fine-tuned on an unspecified dataset, achieving a validation loss of 0.6223. It is intended for general conversational AI tasks, building upon the base capabilities of the Llama 2 architecture.

Loading preview...

Model Overview

CharlesLi/llama_2_o1_25_full is a fine-tuned version of the Meta Llama 2 7B chat model (meta-llama/Llama-2-7b-chat-hf). This 7 billion parameter model has undergone a single epoch of fine-tuning, resulting in a validation loss of 0.6223.

Training Details

The model was trained using a learning rate of 2e-05, with a total batch size of 32 across 4 GPUs. The optimizer used was Adam with standard betas and epsilon, and a cosine learning rate scheduler with a 0.1 warmup ratio. The training process involved 600 steps, with evaluation loss progressively decreasing.

Key Characteristics

  • Base Model: Llama-2-7b-chat-hf
  • Parameter Count: 7 billion
  • Context Length: 4096 tokens
  • Training Loss: Achieved a final validation loss of 0.6223.

Intended Uses

Given its base as a chat model, this fine-tuned version is likely suitable for general conversational AI applications, text generation, and understanding tasks. However, specific intended uses and limitations are not detailed in the provided information.