Name: princeton-nlp/Mistral-7B-Base-SFT-DPO API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: princeton-nlp

Overview

princeton-nlp/Mistral-7B-Base-SFT-DPO is a 7 billion parameter language model built upon the Mistral architecture, featuring an 8192-token context window. This model is a direct outcome of the research presented in the preprint SimPO: Simple Preference Optimization with a Reference-Free Reward. It incorporates advanced preference optimization techniques, specifically SimPO, which aims to align the model's outputs more closely with human preferences without requiring a reference reward model.

Key Capabilities

Preference Optimization: Utilizes SimPO for effective alignment with desired output characteristics.
Mistral Architecture: Benefits from the efficient and performant base Mistral model.
Extended Context: Supports an 8192-token context length, suitable for longer interactions and complex tasks.

Good For

Research in Alignment: Ideal for researchers exploring novel preference optimization methods.
Fine-tuning: Provides a strong base for further fine-tuning on specific preference-aligned tasks.
Applications requiring nuanced responses: Suitable for use cases where output quality is judged by human preferences, such as dialogue systems or content generation.

Overview

Overview

Key Capabilities

Good For

Full Model Card (README)