ContextualAI/Contextual_KTO_Mistral_PairRM
ContextualAI's Contextual_KTO_Mistral_PairRM is a 7 billion parameter instruction-tuned language model, built upon the Mistral-7B-Instruct-v0.2 architecture. It is optimized using the KTO (Kahneman-Tversky Optimization) loss function and aligned with the Snorkel-Mistral-PairRM-DPO-Dataset through three iterative KTO training passes. This model excels in instruction following and conversational tasks, achieving a verified score of 33.23 on the Alpaca Eval 2.0 Leaderboard.
Loading preview...
Overview
ContextualAI's Contextual_KTO_Mistral_PairRM is a 7 billion parameter language model derived from the mistralai/Mistral-7B-Instruct-v0.2 family. It leverages a novel alignment methodology involving Kahneman-Tversky Optimization (KTO), a human-centered loss function, applied iteratively over the snorkelai/Snorkel-Mistral-PairRM-DPO-Dataset.
Key Capabilities
- Enhanced Instruction Following: Optimized through KTO, the model is designed for improved adherence to user instructions and preferences.
- Conversational Proficiency: The training methodology, including alignment with a DPO dataset, contributes to its ability to engage in coherent and contextually relevant dialogues.
- Competitive Performance: Achieved a verified score of 33.23 on the Alpaca Eval 2.0 Leaderboard, ranking #2 at the time of its release.
Training Methodology
The model underwent three iterations of KTO training, with each iteration using the previously trained model as a reference. This process aims to refine alignment and performance. Further details on KTO can be found in ContextualAI's code repository and blog post.
Prompting Format
Users should format prompts consistent with the TuluV2 style, using <|user|> and <|assistant|> roles, with the human speaking first. The tokenizer automatically adds a beginning-of-sequence (BOS) token.
Top 3 parameter combinations used by Featherless users for this model. Click a tab to see each config.