Name: kaist-ai/mistral-orpo-beta API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: kaist-ai

Overview

kaist-ai/mistral-orpo-beta is a 7 billion parameter language model based on Mistral-7B-v0.1, developed by KAIST AI. Its key differentiator is the use of Odds Ratio Preference Optimization (ORPO), a novel alignment technique that allows the model to learn preferences directly, bypassing the need for an initial supervised fine-tuning phase. This approach simplifies the alignment process and aims for more efficient preference learning.

Key Capabilities & Performance

ORPO Alignment: Utilizes the ORPO method, fine-tuned exclusively on 61k instances of the cleaned argilla/ultrafeedback-binarized-preferences-cleaned dataset.
Strong Conversational Performance: Achieves an MT-Bench score of 7.32, outperforming models like Zephyr β (7.34) and TULU-2-DPO (7.00) in its size class, and significantly surpassing Llama-2-Chat models.
High Preference Alignment: Demonstrates strong results on AlpacaEval 2.0 with a score of 12.20, indicating effective alignment with human preferences.
Instruction Following: Shows competitive performance on IFEval, with scores of 0.5287 (Prompt-Strict) and 0.6355 (Inst-Strict), suggesting good adherence to instructions.

When to Use This Model

Conversational AI: Ideal for chatbots and dialogue systems where high-quality, aligned responses are crucial.
Instruction Following: Suitable for tasks requiring the model to accurately follow complex instructions.
Preference Learning Research: A valuable model for researchers exploring alternative alignment methods like ORPO.

Overview

Overview

Key Capabilities & Performance

When to Use This Model

Full Model Card (README)