princeton-nlp/Mistral-7B-Instruct-SLiC-HF

TEXT GENERATIONConcurrency Cost:1Model Size:7BQuant:FP8Ctx Length:4kPublished:Jul 6, 2024Architecture:Transformer Cold

princeton-nlp/Mistral-7B-Instruct-SLiC-HF is a 7 billion parameter instruction-tuned language model developed by princeton-nlp, based on the Mistral architecture with a 4096 token context length. This model is fine-tuned using the SimPO (Simple Preference Optimization) method, a reference-free reward approach, making it suitable for tasks requiring robust instruction following and preference alignment. It is derived from research presented in the preprint "SimPO: Simple Preference Optimization with a Reference-Free Reward."

Loading preview...

Model Overview

princeton-nlp/Mistral-7B-Instruct-SLiC-HF is a 7 billion parameter instruction-tuned model built upon the Mistral architecture. Developed by princeton-nlp, this model incorporates a novel fine-tuning approach known as SimPO (Simple Preference Optimization), which is detailed in their research preprint. SimPO is characterized as a reference-free reward method, aiming to enhance instruction following and alignment without relying on explicit reference models for reward signals.

Key Capabilities

  • Instruction Following: Optimized for understanding and executing user instructions effectively.
  • Preference Alignment: Benefits from the SimPO fine-tuning method, designed to align model outputs with desired preferences.
  • Research-backed: Based on the principles outlined in the SimPO: Simple Preference Optimization with a Reference-Free Reward preprint.

Good For

  • Applications requiring a 7B parameter model with strong instruction-following capabilities.
  • Use cases where preference optimization without complex reference models is beneficial.
  • Researchers and developers interested in exploring models fine-tuned with novel preference optimization techniques.