Name: OpenRLHF/Llama-3-8b-rlhf-100k API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: OpenRLHF

OpenRLHF/Llama-3-8b-rlhf-100k Overview

This model is an 8 billion parameter Llama 3 variant developed by OpenRLHF, specifically fine-tuned using Reinforcement Learning from Human Feedback (RLHF). The training process involved 100,000 samples, utilizing a base Llama-3-8b-sft model and a Llama-3-8b-rm reward model. The primary goal of this RLHF fine-tuning was to enhance the model's ability to generate more aligned and contextually appropriate responses.

Key Capabilities & Training Details

Architecture: Llama 3, 8 billion parameters.
Fine-tuning Method: Reinforcement Learning from Human Feedback (RLHF).
Training Data: Leveraged OpenLLMAI/Llama-3-8b-sft-mixture as the base SFT model, OpenLLMAI/Llama-3-8b-rm-mixture as the reward model, and OpenLLMAI/prompt-collection-v0.1 for prompts.
Training Scale: Fine-tuned for 100,000 samples to optimize GPU resource usage.
Context Length: Supports a maximum prompt length of 2048 tokens and a maximum response length of 2048 tokens.
Performance Improvement: Achieved a score of 20.5 on Chat-Arena-Hard, significantly outperforming its llama-3-8b-sft base model which scored 5.6.

Good For

Chatbot Development: Ideal for applications requiring improved conversational quality and alignment.
Response Generation: Suitable for tasks where generating helpful and contextually relevant text is crucial.
Further RLHF Experimentation: Can serve as a strong base for additional RLHF fine-tuning or research due to its optimized training parameters and demonstrated performance uplift.