OpenRLHF/Llama-3-8b-sft-mixture is an 8 billion parameter Llama 3-based language model, fine-tuned by OpenRLHF on a diverse mixture of high-quality open-source datasets. This model serves as a supervised fine-tuning (SFT) checkpoint, optimized as a strong starting point for further RLHF research and development. It offers a robust foundation for general language understanding and generation tasks, leveraging its extensive training on varied instructional and conversational data.
Overview
OpenRLHF/Llama-3-8b-sft-mixture is an 8 billion parameter language model derived from Meta's Llama 3 architecture. Developed by OpenRLHF, this model is a supervised fine-tuning (SFT) checkpoint, specifically designed as a foundational model for subsequent Reinforcement Learning from Human Feedback (RLHF) research. It has been trained for one epoch on a comprehensive mixture of high-quality, open-source datasets.
Key Capabilities
- Strong SFT Baseline: Provides a robust starting point for RLHF experiments, having undergone extensive supervised fine-tuning.
- Diverse Data Training: Trained on a wide array of datasets including ShareGPT, Evol-Instruct, SlimOrca, MathInstruct, Magicoder-Evol-Instruct, GPT4-LLM, OrcaMath, GPTeacher, and UltraInteract, enhancing its general conversational and instructional abilities.
- Llama 3 Foundation: Benefits from the advanced architecture and pre-training of the Meta-Llama-3-8B model.
Good For
- RLHF Research: Ideal for researchers and developers looking for a solid SFT model to begin their RLHF training pipelines.
- General Purpose Applications: Suitable for various language generation and understanding tasks due to its diverse training data.
- Instruction Following: Exhibits strong instruction-following capabilities from its fine-tuning on instructional datasets.