CharlesLi/llama_2_sky_safe_o1_llama_3_70B_reflect_1000_1000_full

TEXT GENERATIONConcurrency Cost:1Model Size:7BQuant:FP8Ctx Length:4kPublished:Jan 13, 2025License:llama2Architecture:Transformer Open Weights Cold

The CharlesLi/llama_2_sky_safe_o1_llama_3_70B_reflect_1000_1000_full model is a 7 billion parameter language model fine-tuned from Meta's Llama-2-7b-chat-hf. This model was fine-tuned on a generator dataset, achieving a loss of 0.8677 on the evaluation set. It is designed for general language generation tasks, leveraging the Llama 2 architecture.

Loading preview...

Overview

This model, llama_2_sky_safe_o1_llama_3_70B_reflect_1000_1000_full, is a fine-tuned variant of the meta-llama/Llama-2-7b-chat-hf architecture. It features 7 billion parameters and was specifically trained on a generator dataset.

Key Training Details

The model was trained using the following hyperparameters:

  • Learning Rate: 2e-05
  • Batch Size: 4 (train), 4 (eval)
  • Gradient Accumulation: 2 steps, leading to a total train batch size of 32
  • Optimizer: Adam with betas=(0.9, 0.999) and epsilon=1e-08
  • Scheduler: Cosine learning rate scheduler with a 0.1 warmup ratio
  • Epochs: 1

Performance

During training, the model achieved a loss of 0.8677 on its evaluation set. This indicates its performance on the specific generator dataset it was fine-tuned on.

Frameworks Used

The training process utilized:

  • Transformers 4.44.2
  • Pytorch 2.4.1+cu121
  • Datasets 3.0.0
  • Tokenizers 0.19.1

Intended Use

While specific intended uses and limitations require more information, as per the original model card, its fine-tuning on a generator dataset suggests its suitability for various text generation tasks.