princeton-nlp/Mistral-7B-Base-SFT-DPO

TEXT GENERATIONConcurrency Cost:1Model Size:7BQuant:FP8Ctx Length:8kPublished:May 17, 2024Architecture:Transformer Cold

princeton-nlp/Mistral-7B-Base-SFT-DPO is a 7 billion parameter language model developed by princeton-nlp, based on the Mistral architecture with an 8192-token context length. This model is derived from research on SimPO (Simple Preference Optimization with a Reference-Free Reward), focusing on preference optimization techniques. It is designed for tasks benefiting from advanced alignment methods, offering improved performance in areas where human preferences are critical.

Loading preview...

Overview

princeton-nlp/Mistral-7B-Base-SFT-DPO is a 7 billion parameter language model built upon the Mistral architecture, featuring an 8192-token context window. This model is a direct outcome of the research presented in the preprint SimPO: Simple Preference Optimization with a Reference-Free Reward. It incorporates advanced preference optimization techniques, specifically SimPO, which aims to align the model's outputs more closely with human preferences without requiring a reference reward model.

Key Capabilities

  • Preference Optimization: Utilizes SimPO for effective alignment with desired output characteristics.
  • Mistral Architecture: Benefits from the efficient and performant base Mistral model.
  • Extended Context: Supports an 8192-token context length, suitable for longer interactions and complex tasks.

Good For

  • Research in Alignment: Ideal for researchers exploring novel preference optimization methods.
  • Fine-tuning: Provides a strong base for further fine-tuning on specific preference-aligned tasks.
  • Applications requiring nuanced responses: Suitable for use cases where output quality is judged by human preferences, such as dialogue systems or content generation.

Popular Sampler Settings

Top 3 parameter combinations used by Featherless users for this model. Click a tab to see each config.

temperature
top_p
top_k
frequency_penalty
presence_penalty
repetition_penalty
min_p