abideen/Mistral-v2-orpo
abideen/Mistral-v2-orpo is a 7 billion parameter causal language model fine-tuned from Mistral-7B-v0.2 using Odds Ratio Preference Optimization (ORPO). This model leverages the argilla/distilabel-capybara-dpo-7k-binarized preference dataset to combine supervised fine-tuning and alignment into a single objective. It is designed to achieve state-of-the-art results in alignment tasks, outperforming traditional SFT+DPO methods.
Loading preview...
Model Overview
abideen/Mistral-v2-orpo is a 7 billion parameter language model derived from Mistral-7B-v0.2. It has been fine-tuned using Odds Ratio Preference Optimization (ORPO) on the argilla/distilabel-capybara-dpo-7k-binarized preference dataset. This training approach integrates both supervised fine-tuning (SFT) and alignment into a single, memory-efficient objective function.
Key Capabilities & Features
- ORPO Training Method: Utilizes Odds Ratio Preference Optimization, a novel technique that combines SFT and alignment into one objective, eliminating the need for a separate reference model.
- Efficiency: The ORPO method is reference model-free, making it more memory-friendly compared to traditional DPO/PPO approaches.
- Performance: According to the ORPO paper, this method has shown to outperform SFT and SFT+DPO on various models including PHI-2, Llama 2, and Mistral. Specifically, Mistral ORPO achieved 12.20% on AlpacaEval2.0, 66.19% on IFEval, and 7.32 on MT-Bench out of Hugging Face Zephyr Beta.
- Training: The model was trained for one epoch using the LazyORPO Colab notebook, taking approximately 8 hours on an A100 GPU.
When to Use This Model
This model is particularly well-suited for use cases requiring strong alignment and preference optimization, especially when computational resources or memory are a concern. Its ORPO-based training makes it a strong candidate for tasks where combining instruction-following and human preference alignment is critical, potentially offering improved performance over models trained with separate SFT and DPO/PPO stages.
Top 3 parameter combinations used by Featherless users for this model. Click a tab to see each config.