Model Overview
The CharlesLi/llama_2_rlhf_safe_4o_default_100_full model is a fine-tuned version of Meta's Llama-2-7b-chat-hf, specifically adapted for safety-aligned conversational tasks. Developed by CharlesLi, this 7 billion parameter model has undergone Reinforcement Learning from Human Feedback (RLHF) to improve its responses in chat environments, focusing on generating safer and more appropriate outputs.
Key Characteristics
- Base Model: Built upon the robust
meta-llama/Llama-2-7b-chat-hf architecture. - Fine-tuning: Utilizes a generator dataset with RLHF for enhanced safety and alignment.
- Parameter Count: Features 7 billion parameters, offering a balance between performance and computational efficiency.
- Context Length: Supports a context window of 4096 tokens.
- Training Details: Trained with a learning rate of 2e-05, a batch size of 32 (total), and a cosine learning rate scheduler over 1 epoch.
Intended Use Cases
This model is particularly well-suited for applications where:
- Safety is paramount: Designed for chat applications requiring a strong emphasis on safe and aligned responses.
- Conversational AI: Ideal for chatbots, virtual assistants, and interactive dialogue systems.
- Llama-2 ecosystem integration: Benefits from the established capabilities and community support of the Llama-2 family.