RLHFlow/LLaMA3-SFT
RLHFlow/LLaMA3-SFT is an 8 billion parameter SFT (Supervised Fine-Tuning) checkpoint derived from Meta-Llama-3-8B, developed by a team including Hanze Dong and Wei Xiong. This model is specifically designed as a strong baseline for RLHF research, having been fine-tuned on a diverse mixture of high-quality open-source data. It serves as a foundational model for further reinforcement learning applications, offering solid performance across various benchmarks before any RLHF training.
Loading preview...
RLHFlow/LLaMA3-SFT: A Strong SFT Baseline for RLHF Research
This model is an 8 billion parameter Supervised Fine-Tuning (SFT) checkpoint, originating from meta-llama/Meta-Llama-3-8B. It was developed by a research team including Hanze Dong and Wei Xiong, as part of the RLHFlow/Online-RLHF project, detailed in their TMLR 2024 paper, "RLHF Workflow: From Reward Modeling to Online RLHF".
Key Capabilities & Characteristics
- Foundation for RLHF: Designed specifically as a robust starting point for Reinforcement Learning from Human Feedback (RLHF) research, without having undergone RLHF training itself.
- Diverse Data Training: Fine-tuned for one epoch on a mixture of diverse, high-quality open-source datasets, ensuring a broad understanding of various tasks.
- Solid Baseline Performance: Achieves competitive scores in a zero-shot setting across academic benchmarks, including:
- GSM-8K: 74.2
- HumanEval: 64.6
- TruthfulQA: 63.4
- ARC: 53.5
- MBPP: 58.6
Good For
- RLHF Experimentation: Ideal for researchers and developers looking for a strong, pre-trained SFT model to build upon for their RLHF pipelines and experiments.
- General Language Understanding: Its training on diverse datasets makes it suitable for a wide range of general language understanding and generation tasks.
- Benchmarking: Can be used as a reliable baseline to compare the performance improvements gained from subsequent RLHF stages or other fine-tuning methods.
Top 3 parameter combinations used by Featherless users for this model. Click a tab to see each config.