Model Overview

This model, llama_2_rlhf_safe_llama_3_8B_reflect_100_full, is a fine-tuned variant of the Meta Llama-2-7b-chat-hf architecture. It features 7 billion parameters and supports a context length of 4096 tokens. The model was fine-tuned on a specific "generator dataset" with the goal of achieving particular performance characteristics, though the exact nature of these characteristics is not detailed in the provided information.

Training Details

The fine-tuning process involved a learning rate of 2e-05, a train_batch_size of 4, and a gradient_accumulation_steps of 2, resulting in a total_train_batch_size of 32. An Adam optimizer with betas=(0.9, 0.999) and epsilon=1e-08 was used, alongside a cosine learning rate scheduler with a warmup ratio of 0.1. The training was conducted for 1 epoch across 4 GPUs.

Performance

During evaluation, the model achieved a loss of 1.6293 on the evaluation set. Further specific benchmarks or performance metrics are not available in the current documentation.

Intended Use

While specific intended uses and limitations are not explicitly detailed, this model is suitable for applications that leverage the Llama-2 architecture and can benefit from its fine-tuned characteristics. Users should conduct further evaluation to determine its suitability for specific tasks.

Overview

Model Overview

Training Details

Performance

Intended Use

Full Model Card (README)