OpenRLHF/Llama-3-8b-rlhf-100k
TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:8kPublished:Jun 23, 2024Architecture:Transformer0.0K Warm

OpenRLHF's Llama-3-8b-rlhf-100k is an 8 billion parameter Llama 3 model fine-tuned using Reinforcement Learning from Human Feedback (RLHF) for 100,000 samples. This model builds upon a Llama-3-8b-sft base and a Llama-3-8b-rm reward model, demonstrating improved conversational performance over its SFT base. It is optimized for generating more aligned and helpful responses in chat-based applications.

Loading preview...

Popular Sampler Settings

Top 3 parameter combinations used by Featherless users for this model. Click a tab to see each config.

temperature
top_p
top_k
frequency_penalty
presence_penalty
repetition_penalty
min_p