OpenRLHF/Llama-3-8b-sft-mixture

Warm
Public
8B
FP8
8192
1
Jun 14, 2024
Hugging Face

OpenRLHF/Llama-3-8b-sft-mixture is an 8 billion parameter Llama 3-based language model, fine-tuned by OpenRLHF on a diverse mixture of high-quality open-source datasets. This model serves as a supervised fine-tuning (SFT) checkpoint, optimized as a strong starting point for further RLHF research and development. It offers a robust foundation for general language understanding and generation tasks, leveraging its extensive training on varied instructional and conversational data.

Overview

Overview

OpenRLHF/Llama-3-8b-sft-mixture is an 8 billion parameter language model derived from Meta's Llama 3 architecture. Developed by OpenRLHF, this model is a supervised fine-tuning (SFT) checkpoint, specifically designed as a foundational model for subsequent Reinforcement Learning from Human Feedback (RLHF) research. It has been trained for one epoch on a comprehensive mixture of high-quality, open-source datasets.

Key Capabilities

  • Strong SFT Baseline: Provides a robust starting point for RLHF experiments, having undergone extensive supervised fine-tuning.
  • Diverse Data Training: Trained on a wide array of datasets including ShareGPT, Evol-Instruct, SlimOrca, MathInstruct, Magicoder-Evol-Instruct, GPT4-LLM, OrcaMath, GPTeacher, and UltraInteract, enhancing its general conversational and instructional abilities.
  • Llama 3 Foundation: Benefits from the advanced architecture and pre-training of the Meta-Llama-3-8B model.

Good For

  • RLHF Research: Ideal for researchers and developers looking for a solid SFT model to begin their RLHF training pipelines.
  • General Purpose Applications: Suitable for various language generation and understanding tasks due to its diverse training data.
  • Instruction Following: Exhibits strong instruction-following capabilities from its fine-tuning on instructional datasets.