princeton-nlp/Mistral-7B-Base-SFT-SimPO
The princeton-nlp/Mistral-7B-Base-SFT-SimPO model is a 7 billion parameter language model based on the Mistral architecture, fine-tuned using the Simple Preference Optimization (SimPO) method. Developed by princeton-nlp, this model leverages a reference-free reward approach for preference optimization. It is designed for tasks benefiting from advanced alignment techniques, offering a context length of 8192 tokens.
Loading preview...
Model Overview
The princeton-nlp/Mistral-7B-Base-SFT-SimPO model is a 7 billion parameter language model built upon the Mistral architecture. Its key differentiator lies in its fine-tuning methodology: it utilizes Simple Preference Optimization (SimPO), a novel approach that operates with a reference-free reward mechanism. This technique is detailed in the preprint "SimPO: Simple Preference Optimization with a Reference-Free Reward" and aims to enhance model alignment and performance through efficient preference learning.
Key Capabilities
- Preference Optimization: Fine-tuned with SimPO, which offers a distinct method for aligning language models with human preferences without requiring a reference model for reward generation.
- Mistral-7B Base: Inherits the strong foundational capabilities of the Mistral-7B architecture.
- Context Length: Supports a context window of 8192 tokens, suitable for processing moderately long inputs.
Good For
- Research in Alignment: Ideal for researchers exploring alternative and efficient preference optimization techniques like SimPO.
- Applications requiring fine-tuned Mistral models: Suitable for tasks where a Mistral-7B base model with advanced alignment is beneficial.
- Experimentation with reference-free reward models: Provides a practical implementation of the SimPO method for evaluation and development.
Top 3 parameter combinations used by Featherless users for this model. Click a tab to see each config.