Salesforce/LLaMA-3-8B-SFR-SFT-R
Salesforce/LLaMA-3-8B-SFR-SFT-R is an 8 billion parameter LLaMA-3 based language model developed by Salesforce, specifically a Supervised Fine-Tuning (SFT) model. It serves as a foundational component within a larger Reinforcement Learning from Human Feedback (RLHF) workflow, designed for research purposes. This model is part of a series that includes a reward model and an RLHF model, indicating its role in advanced alignment techniques.
Loading preview...
Model Overview
Salesforce/LLaMA-3-8B-SFR-SFT-R is an 8 billion parameter LLaMA-3 based language model developed by Salesforce. This specific release represents the Supervised Fine-Tuning (SFT) stage within a broader Reinforcement Learning from Human Feedback (RLHF) workflow. It is designed as a component for research into advanced model alignment and optimization techniques.
Key Characteristics
- Architecture: Based on the LLaMA-3 model family.
- Parameter Count: 8 billion parameters.
- Context Length: Supports an 8192 token context window.
- Role in Workflow: This is the SFT model, intended to be used in conjunction with its corresponding reward model and RLHF model (Salesforce/LLaMA-3-8B-SFR-RM-R and Salesforce/LLaMA-3-8B-SFR-Iterative-DPO-R, respectively).
Intended Use
This model is released for research purposes only to support an academic paper on RLHF workflows. Users are advised to evaluate potential concerns related to accuracy, safety, and fairness, and to consider the common limitations of AI, especially for high-risk scenarios. It is not specifically designed or evaluated for all downstream production purposes.