Name: princeton-nlp/Mistral-7B-Instruct-RDPO API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: princeton-nlp

Overview

princeton-nlp/Mistral-7B-Instruct-RDPO is a 7 billion parameter instruction-tuned language model. This model is a direct release from the research presented in the preprint, "SimPO: Simple Preference Optimization with a Reference-Free Reward." It leverages a novel training methodology known as Reference-Free DPO (RDPO), which aims to optimize model preferences without the need for an explicit reference reward model.

Key Capabilities

Instruction Following: Designed to accurately follow a wide range of user instructions.
Preference Optimization: Utilizes the RDPO method for fine-tuning, offering a distinct approach to aligning model outputs with desired preferences.

Good for

Researchers and developers interested in exploring advanced preference optimization techniques, particularly those involving reference-free methods.
Applications requiring a 7B instruction-tuned model with a unique training paradigm for improved response quality.

For more in-depth technical details and implementation, refer to the SimPO repository and the associated preprint.

Overview

Overview

Key Capabilities

Good for

Full Model Card (README)