Name: kaist-ai/mistral-orpo-alpha API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: kaist-ai

Overview

kaist-ai/mistral-orpo-alpha is a 7 billion parameter language model based on the Mistral-7B-v0.1 architecture. Developed by KAIST AI, this model distinguishes itself by utilizing Odds Ratio Preference Optimization (ORPO), a method that enables direct preference learning without the need for an initial supervised fine-tuning phase. It was exclusively fine-tuned on the HuggingFaceH4/ultrafeedback_binarized dataset.

Key Capabilities & Performance

Preference Optimization: Leverages ORPO for efficient alignment, bypassing traditional SFT warmups.
Competitive Alignment: Achieves an MT-Bench score of 7.23, AlpacaEval 1.0 score of 87.92, and AlpacaEval 2.0 score of 11.33.
Instruction Following: Demonstrates instruction-following capabilities with IFEval scores of 0.5009 (Prompt-Strict) and 0.5995 (Inst-Strict).

When to Use This Model

Preference-aligned tasks: Ideal for applications requiring models to adhere to specific user preferences or conversational styles.
Conversational AI: Suitable for building chatbots or dialogue systems where alignment with human feedback is crucial.
Research in Alignment: A valuable model for researchers exploring alternative preference optimization techniques like ORPO.

Overview

Overview

Key Capabilities & Performance

When to Use This Model

Full Model Card (README)