princeton-nlp/Mistral-7B-Instruct-SLiC-HF
princeton-nlp/Mistral-7B-Instruct-SLiC-HF is a 7 billion parameter instruction-tuned language model developed by princeton-nlp, based on the Mistral architecture with a 4096 token context length. This model is fine-tuned using the SimPO (Simple Preference Optimization) method, a reference-free reward approach, making it suitable for tasks requiring robust instruction following and preference alignment. It is derived from research presented in the preprint "SimPO: Simple Preference Optimization with a Reference-Free Reward."
Loading preview...
Model Overview
princeton-nlp/Mistral-7B-Instruct-SLiC-HF is a 7 billion parameter instruction-tuned model built upon the Mistral architecture. Developed by princeton-nlp, this model incorporates a novel fine-tuning approach known as SimPO (Simple Preference Optimization), which is detailed in their research preprint. SimPO is characterized as a reference-free reward method, aiming to enhance instruction following and alignment without relying on explicit reference models for reward signals.
Key Capabilities
- Instruction Following: Optimized for understanding and executing user instructions effectively.
- Preference Alignment: Benefits from the SimPO fine-tuning method, designed to align model outputs with desired preferences.
- Research-backed: Based on the principles outlined in the SimPO: Simple Preference Optimization with a Reference-Free Reward preprint.
Good For
- Applications requiring a 7B parameter model with strong instruction-following capabilities.
- Use cases where preference optimization without complex reference models is beneficial.
- Researchers and developers interested in exploring models fine-tuned with novel preference optimization techniques.