princeton-nlp/Mistral-7B-Base-SFT-KTO

TEXT GENERATIONConcurrency Cost:1Model Size:7BQuant:FP8Ctx Length:4kPublished:May 17, 2024Architecture:Transformer Cold

The princeton-nlp/Mistral-7B-Base-SFT-KTO is a 7 billion parameter language model based on the Mistral architecture, fine-tuned using the KTO (Kahneman-Tversky Optimization) method. Developed by princeton-nlp, this model is derived from research presented in the SimPO preprint, focusing on preference optimization with a reference-free reward. It is primarily designed for tasks benefiting from advanced alignment techniques, offering improved response quality and adherence to preferences.

Loading preview...

Overview

The princeton-nlp/Mistral-7B-Base-SFT-KTO is a 7 billion parameter language model built upon the Mistral architecture. This model distinguishes itself through its fine-tuning process, which utilizes the KTO (Kahneman-Tversky Optimization) method. KTO is a preference optimization technique that operates with a reference-free reward, as detailed in the associated preprint, SimPO: Simple Preference Optimization with a Reference-Free Reward. This approach aims to enhance the model's ability to align with human preferences and generate high-quality, desirable outputs.

Key Capabilities

  • Preference Optimization: Fine-tuned with KTO for improved alignment with desired output characteristics.
  • Reference-Free Reward: Leverages a novel reward mechanism that does not require explicit reference responses.
  • Mistral-7B Base: Benefits from the strong foundational capabilities of the Mistral-7B architecture.

Good for

  • Applications requiring models with enhanced preference alignment.
  • Research and development in advanced fine-tuning and alignment techniques.
  • Tasks where generating high-quality, human-preferred responses is critical.