CharlesLi/llama_2_sky_safe_o1_llama_3_8B_reflect_4000_1000_full

TEXT GENERATIONConcurrency Cost:1Model Size:7BQuant:FP8Ctx Length:4kPublished:Jan 13, 2025License:llama2Architecture:Transformer Open Weights Cold

CharlesLi/llama_2_sky_safe_o1_llama_3_8B_reflect_4000_1000_full is a 7 billion parameter language model, fine-tuned from Meta's Llama-2-7b-chat-hf. This model was trained with a learning rate of 2e-05 and a cosine scheduler over one epoch. It achieved a validation loss of 0.6644, indicating its performance on the evaluation set.

Loading preview...

Model Overview

This model, llama_2_sky_safe_o1_llama_3_8B_reflect_4000_1000_full, is a fine-tuned variant of the meta-llama/Llama-2-7b-chat-hf base model. It leverages the Llama 2 architecture with 7 billion parameters and was trained on a specific generator dataset.

Training Details

The model underwent a single epoch of training using the following key hyperparameters:

  • Base Model: meta-llama/Llama-2-7b-chat-hf
  • Learning Rate: 2e-05
  • Optimizer: Adam with betas=(0.9, 0.999) and epsilon=1e-08
  • LR Scheduler: Cosine with a warmup ratio of 0.1
  • Batch Sizes: train_batch_size of 4, eval_batch_size of 4, with gradient_accumulation_steps of 2, resulting in a total_train_batch_size of 32.
  • Frameworks: Transformers 4.44.2, Pytorch 2.4.1+cu121, Datasets 3.0.0, Tokenizers 0.19.1

Performance

During training, the model achieved a final validation loss of 0.6644. Intermediate training results showed a loss of 0.8082 at step 100 and 0.6681 at step 200.

Limitations

The model description and intended uses are not fully detailed in the provided information, suggesting further evaluation and documentation are needed to understand its specific strengths and limitations.