Name: RLHFlow/LLaMA3-SFT API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: RLHFlow

RLHFlow/LLaMA3-SFT: A Strong SFT Baseline for RLHF Research

This model is an 8 billion parameter Supervised Fine-Tuning (SFT) checkpoint, originating from meta-llama/Meta-Llama-3-8B. It was developed by a research team including Hanze Dong and Wei Xiong, as part of the RLHFlow/Online-RLHF project, detailed in their TMLR 2024 paper, "RLHF Workflow: From Reward Modeling to Online RLHF".

Key Capabilities & Characteristics

Foundation for RLHF: Designed specifically as a robust starting point for Reinforcement Learning from Human Feedback (RLHF) research, without having undergone RLHF training itself.
Diverse Data Training: Fine-tuned for one epoch on a mixture of diverse, high-quality open-source datasets, ensuring a broad understanding of various tasks.
Solid Baseline Performance: Achieves competitive scores in a zero-shot setting across academic benchmarks, including:
- GSM-8K: 74.2
- HumanEval: 64.6
- TruthfulQA: 63.4
- ARC: 53.5
- MBPP: 58.6

Good For

RLHF Experimentation: Ideal for researchers and developers looking for a strong, pre-trained SFT model to build upon for their RLHF pipelines and experiments.
General Language Understanding: Its training on diverse datasets makes it suitable for a wide range of general language understanding and generation tasks.
Benchmarking: Can be used as a reliable baseline to compare the performance improvements gained from subsequent RLHF stages or other fine-tuning methods.

Overview

RLHFlow/LLaMA3-SFT: A Strong SFT Baseline for RLHF Research

Key Capabilities & Characteristics

Good For

Full Model Card (README)