princeton-nlp/Mistral-7B-Base-SFT-SimPO

TEXT GENERATIONConcurrency Cost:1Model Size:7BQuant:FP8Ctx Length:8kPublished:May 17, 2024Architecture:Transformer Cold

The princeton-nlp/Mistral-7B-Base-SFT-SimPO model is a 7 billion parameter language model based on the Mistral architecture, fine-tuned using the Simple Preference Optimization (SimPO) method. Developed by princeton-nlp, this model leverages a reference-free reward approach for preference optimization. It is designed for tasks benefiting from advanced alignment techniques, offering a context length of 8192 tokens.

Loading preview...

Model Overview

The princeton-nlp/Mistral-7B-Base-SFT-SimPO model is a 7 billion parameter language model built upon the Mistral architecture. Its key differentiator lies in its fine-tuning methodology: it utilizes Simple Preference Optimization (SimPO), a novel approach that operates with a reference-free reward mechanism. This technique is detailed in the preprint "SimPO: Simple Preference Optimization with a Reference-Free Reward" and aims to enhance model alignment and performance through efficient preference learning.

Key Capabilities

  • Preference Optimization: Fine-tuned with SimPO, which offers a distinct method for aligning language models with human preferences without requiring a reference model for reward generation.
  • Mistral-7B Base: Inherits the strong foundational capabilities of the Mistral-7B architecture.
  • Context Length: Supports a context window of 8192 tokens, suitable for processing moderately long inputs.

Good For

  • Research in Alignment: Ideal for researchers exploring alternative and efficient preference optimization techniques like SimPO.
  • Applications requiring fine-tuned Mistral models: Suitable for tasks where a Mistral-7B base model with advanced alignment is beneficial.
  • Experimentation with reference-free reward models: Provides a practical implementation of the SimPO method for evaluation and development.

Popular Sampler Settings

Top 3 parameter combinations used by Featherless users for this model. Click a tab to see each config.

temperature
top_p
top_k
frequency_penalty
presence_penalty
repetition_penalty
min_p