CharlesLi/llama_2_sky_safe_o1_llama_3_8B_reflect_1000_1000_full

TEXT GENERATIONConcurrency Cost:1Model Size:7BQuant:FP8Ctx Length:4kPublished:Jan 13, 2025License:llama2Architecture:Transformer Open Weights Cold

The CharlesLi/llama_2_sky_safe_o1_llama_3_8B_reflect_1000_1000_full model is a 7 billion parameter language model, fine-tuned from Meta's Llama-2-7b-chat-hf. This model was trained on a generator dataset, achieving a loss of 0.7889 on the evaluation set. It is designed for general language generation tasks, leveraging the Llama 2 architecture. The fine-tuning process focused on specific data to enhance its reflective capabilities.

Loading preview...

Model Overview

CharlesLi/llama_2_sky_safe_o1_llama_3_8B_reflect_1000_1000_full is a 7 billion parameter language model, fine-tuned from the meta-llama/Llama-2-7b-chat-hf base model. This fine-tuning process specifically utilized a generator dataset, resulting in an evaluation loss of 0.7889.

Key Characteristics

  • Base Model: Meta Llama-2-7b-chat-hf, a robust Llama 2 architecture.
  • Parameter Count: 7 billion parameters, offering a balance between performance and computational efficiency.
  • Fine-tuning Focus: Trained on a generator dataset, suggesting an optimization for text generation tasks.
  • Training Performance: Achieved a loss of 0.7889 on the evaluation set during training.

Training Details

The model was trained with the following hyperparameters:

  • Learning Rate: 2e-05
  • Batch Sizes: train_batch_size of 4, eval_batch_size of 4, with gradient_accumulation_steps of 2, leading to a total_train_batch_size of 32.
  • Optimizer: Adam with betas=(0.9, 0.999) and epsilon=1e-08.
  • Scheduler: Cosine learning rate scheduler with a warmup ratio of 0.1.
  • Epochs: Trained for 1 epoch.
  • Frameworks: Utilized Transformers 4.44.2, Pytorch 2.4.1+cu121, Datasets 3.0.0, and Tokenizers 0.19.1.

Intended Uses

This model is suitable for general text generation applications, leveraging the capabilities inherited from the Llama 2 base model and its specific fine-tuning on a generator dataset.