Name: abideen/Mistral-v2-orpo API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: abideen

Model Overview

abideen/Mistral-v2-orpo is a 7 billion parameter language model derived from Mistral-7B-v0.2. It has been fine-tuned using Odds Ratio Preference Optimization (ORPO) on the argilla/distilabel-capybara-dpo-7k-binarized preference dataset. This training approach integrates both supervised fine-tuning (SFT) and alignment into a single, memory-efficient objective function.

Key Capabilities & Features

ORPO Training Method: Utilizes Odds Ratio Preference Optimization, a novel technique that combines SFT and alignment into one objective, eliminating the need for a separate reference model.
Efficiency: The ORPO method is reference model-free, making it more memory-friendly compared to traditional DPO/PPO approaches.
Performance: According to the ORPO paper, this method has shown to outperform SFT and SFT+DPO on various models including PHI-2, Llama 2, and Mistral. Specifically, Mistral ORPO achieved 12.20% on AlpacaEval2.0, 66.19% on IFEval, and 7.32 on MT-Bench out of Hugging Face Zephyr Beta.
Training: The model was trained for one epoch using the LazyORPO Colab notebook, taking approximately 8 hours on an A100 GPU.

When to Use This Model

This model is particularly well-suited for use cases requiring strong alignment and preference optimization, especially when computational resources or memory are a concern. Its ORPO-based training makes it a strong candidate for tasks where combining instruction-following and human preference alignment is critical, potentially offering improved performance over models trained with separate SFT and DPO/PPO stages.

Overview

Model Overview

Key Capabilities & Features

When to Use This Model

Full Model Card (README)