CharlesLi/llama_2_sky_o1_4_full

TEXT GENERATIONConcurrency Cost:1Model Size:7BQuant:FP8Ctx Length:4kPublished:Jan 13, 2025License:llama2Architecture:Transformer Open Weights Cold

The CharlesLi/llama_2_sky_o1_4_full is a 7 billion parameter language model, fine-tuned from Meta's Llama-2-7b-chat-hf. This model was trained on a generator dataset, achieving a validation loss of 0.6753. It is intended for tasks requiring a fine-tuned Llama 2 base, with a context length of 4096 tokens.

Loading preview...

Model Overview

The CharlesLi/llama_2_sky_o1_4_full is a 7 billion parameter language model, fine-tuned from the meta-llama/Llama-2-7b-chat-hf base model. It was specifically trained on a 'generator dataset' and achieved a final validation loss of 0.6753 during its single-epoch training.

Key Training Details

  • Base Model: meta-llama/Llama-2-7b-chat-hf
  • Parameters: 7 billion
  • Training Dataset: Generator dataset (specifics not detailed in README)
  • Final Validation Loss: 0.6753
  • Hyperparameters:
    • Learning Rate: 2e-05
    • Optimizer: Adam with betas=(0.9, 0.999) and epsilon=1e-08
    • Batch Size: 4 (train and eval), with 2 gradient accumulation steps for a total effective batch size of 32
    • Epochs: 1
    • LR Scheduler: Cosine with 0.1 warmup ratio

Intended Use Cases

This model is suitable for applications that benefit from a fine-tuned Llama 2 variant, particularly for tasks aligned with its 'generator dataset' training. Developers can leverage its 7B parameter size for efficient deployment while benefiting from the Llama 2 architecture.