CharlesLi/llama_2_sky_safe_o1_llama_3_70B_default_4000_1000_full

TEXT GENERATIONConcurrency Cost:1Model Size:7BQuant:FP8Ctx Length:4kPublished:Jan 13, 2025License:llama2Architecture:Transformer Open Weights Cold

CharlesLi/llama_2_sky_safe_o1_llama_3_70B_default_4000_1000_full is a 7 billion parameter language model fine-tuned from Meta's Llama-2-7b-chat-hf. This model was trained on a specific generator dataset, achieving a validation loss of 0.6372. It is intended for general language generation tasks based on its Llama-2 foundation, with potential specialization from its fine-tuning data. The model was trained with a 4096 token context length.

Loading preview...

Model Overview

This model, llama_2_sky_safe_o1_llama_3_70B_default_4000_1000_full, is a fine-tuned variant of meta-llama/Llama-2-7b-chat-hf. It has 7 billion parameters and was trained on a specific "generator dataset" to adapt its capabilities. The training process involved a single epoch with a learning rate of 2e-05 and a total batch size of 32 across 4 GPUs.

Training Details

Key hyperparameters used during training include:

  • Optimizer: Adam with betas=(0.9, 0.999) and epsilon=1e-08
  • Learning Rate Scheduler: Cosine with a warmup ratio of 0.1
  • Epochs: 1
  • Context Length: 4096 tokens (inferred from model name)

The model achieved a final validation loss of 0.6372. Intermediate training results show a consistent decrease in loss, with a training loss of 0.674 and a validation loss of 0.6461 at step 200.

Intended Uses

Given its foundation on Llama-2-7b-chat-hf, this model is likely suitable for conversational AI, text generation, and understanding tasks. Its fine-tuning on a "generator dataset" suggests potential optimization for specific content creation or response generation, though further details on the dataset are not provided in the original model card.