kaist-ai/mistral-orpo-alpha

TEXT GENERATIONConcurrency Cost:1Model Size:7BQuant:FP8Ctx Length:8kLicense:mitArchitecture:Transformer0.0K Open Weights Cold

kaist-ai/mistral-orpo-alpha is a 7 billion parameter language model developed by KAIST AI, fine-tuned from Mistral-7B-v0.1 using the Odds Ratio Preference Optimization (ORPO) method. This model learns preferences directly without a supervised fine-tuning warmup phase, specifically trained on the HuggingFaceH4/ultrafeedback_binarized dataset. It demonstrates competitive performance on alignment benchmarks like AlpacaEval and MT-Bench, making it suitable for preference-aligned conversational AI tasks.

Loading preview...

Overview

kaist-ai/mistral-orpo-alpha is a 7 billion parameter language model based on the Mistral-7B-v0.1 architecture. Developed by KAIST AI, this model distinguishes itself by utilizing Odds Ratio Preference Optimization (ORPO), a method that enables direct preference learning without the need for an initial supervised fine-tuning phase. It was exclusively fine-tuned on the HuggingFaceH4/ultrafeedback_binarized dataset.

Key Capabilities & Performance

  • Preference Optimization: Leverages ORPO for efficient alignment, bypassing traditional SFT warmups.
  • Competitive Alignment: Achieves an MT-Bench score of 7.23, AlpacaEval 1.0 score of 87.92, and AlpacaEval 2.0 score of 11.33.
  • Instruction Following: Demonstrates instruction-following capabilities with IFEval scores of 0.5009 (Prompt-Strict) and 0.5995 (Inst-Strict).

When to Use This Model

  • Preference-aligned tasks: Ideal for applications requiring models to adhere to specific user preferences or conversational styles.
  • Conversational AI: Suitable for building chatbots or dialogue systems where alignment with human feedback is crucial.
  • Research in Alignment: A valuable model for researchers exploring alternative preference optimization techniques like ORPO.

Popular Sampler Settings

Top 3 parameter combinations used by Featherless users for this model. Click a tab to see each config.

temperature
top_p
top_k
frequency_penalty
presence_penalty
repetition_penalty
min_p