CharlesLi/llama_2_rlhf_safe_4o_reflect_100_full

TEXT GENERATIONConcurrency Cost:1Model Size:7BQuant:FP8Ctx Length:4kPublished:Jan 13, 2025License:llama2Architecture:Transformer Open Weights Cold

The CharlesLi/llama_2_rlhf_safe_4o_reflect_100_full is a 7 billion parameter language model, fine-tuned from Meta's Llama-2-7b-chat-hf. This model is specifically adapted using a generator dataset, focusing on safety and reflection capabilities. It is intended for applications requiring a Llama-2-based model with enhanced safety characteristics through RLHF.

Loading preview...

Model Overview

This model, llama_2_rlhf_safe_4o_reflect_100_full, is a fine-tuned variant of the Meta Llama-2-7b-chat-hf base model. It has 7 billion parameters and was trained with a context length of 4096 tokens. The fine-tuning process utilized a specific "generator dataset" and incorporated Reinforcement Learning from Human Feedback (RLHF) to enhance safety and reflective qualities.

Key Characteristics

  • Base Model: Meta Llama-2-7b-chat-hf
  • Parameter Count: 7 billion
  • Fine-tuning Objective: Enhanced safety and reflection through RLHF
  • Training Data: Generator dataset
  • Training Loss: Achieved a loss of 2.0096 on the evaluation set.

Training Details

The model was trained with a learning rate of 2e-05, a total batch size of 32 (across 4 GPUs with 2 gradient accumulation steps), and a cosine learning rate scheduler with a 0.1 warmup ratio over 1 epoch. The optimizer used was Adam with standard betas and epsilon. This configuration aims to adapt the Llama-2 architecture for specific safety-oriented applications.