OpenRLHF's Llama-3-8b-rlhf-100k is an 8 billion parameter Llama 3 model fine-tuned using Reinforcement Learning from Human Feedback (RLHF) for 100,000 samples. This model builds upon a Llama-3-8b-sft base and a Llama-3-8b-rm reward model, demonstrating improved conversational performance over its SFT base. It is optimized for generating more aligned and helpful responses in chat-based applications.
No reviews yet. Be the first to review!