Overview
kaist-ai/mistral-orpo-capybara-7k is a 7 billion parameter language model based on Mistral-7B-v0.1. It was fine-tuned by KAIST AI using the Odds Ratio Preference Optimization (ORPO) method, which allows the model to directly learn preferences without an initial supervised fine-tuning phase. The model was trained for 2.5 hours on four A100 GPUs using 7,000 instances from the argilla/distilabel-capybara-dpo-7k-binarized dataset, focusing on multi-turn conversations.
Key Capabilities & Performance
- Preference Optimization: Utilizes ORPO for efficient alignment, directly learning from preference data.
- Conversational AI: Specifically fine-tuned on a distilled Capybara dataset for enhanced multi-turn dialogue capabilities.
- Strong Alignment Benchmarks: Achieves an MT-Bench score of 7.44 and an AlpacaEval 2.0 (LC) score of 15.9, outperforming other 7B and some 13B models like Zephyr β and TULU-2-DPO.
- Instruction Following: Shows competitive performance on the IFEval benchmark, indicating its ability to follow instructions.
When to Use This Model
This model is particularly well-suited for applications requiring robust conversational abilities and strong alignment with human preferences. Its performance on MT-Bench and AlpacaEval suggests it can generate high-quality, helpful, and harmless responses in interactive dialogue systems. Developers looking for a 7B model optimized for chat and instruction-following tasks, especially those valuing efficient preference learning, should consider kaist-ai/mistral-orpo-capybara-7k.