princeton-nlp/Mistral-7B-Base-SFT-CPO

TEXT GENERATIONConcurrency Cost:1Model Size:7BQuant:FP8Ctx Length:4kPublished:Jul 6, 2024Architecture:Transformer0.0K Cold

The princeton-nlp/Mistral-7B-Base-SFT-CPO is a 7 billion parameter language model based on the Mistral architecture, fine-tuned using the Simple Preference Optimization (SimPO) method. Developed by princeton-nlp, this model is designed to leverage a reference-free reward mechanism for preference optimization, as detailed in its associated research preprint. Its primary use case is to demonstrate and apply the SimPO technique for improved alignment and performance in language generation tasks.

Loading preview...

Overview

princeton-nlp/Mistral-7B-Base-SFT-CPO is a 7 billion parameter language model developed by princeton-nlp. This model is a result of research presented in the preprint "SimPO: Simple Preference Optimization with a Reference-Free Reward" and utilizes the novel SimPO method for fine-tuning. The core innovation lies in its ability to perform preference optimization without requiring a reference reward model, simplifying the alignment process.

Key Capabilities

  • Preference Optimization: Implements the Simple Preference Optimization (SimPO) technique.
  • Reference-Free Reward: Achieves alignment without the need for an explicit reference reward model.
  • Mistral-7B Base: Built upon the robust Mistral-7B architecture, providing a strong foundation for language understanding and generation.

Good For

  • Researchers and developers interested in novel preference optimization techniques.
  • Experimenting with reference-free alignment methods for large language models.
  • Applications requiring a 7B parameter model with enhanced instruction following capabilities derived from advanced fine-tuning.

For more in-depth technical details, refer to the SimPO research preprint and the associated GitHub repository.