Name: ContextualAI/Contextual_KTO_Mistral_PairRM API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: ContextualAI

Overview

ContextualAI's Contextual_KTO_Mistral_PairRM is a 7 billion parameter language model derived from the mistralai/Mistral-7B-Instruct-v0.2 family. It leverages a novel alignment methodology involving Kahneman-Tversky Optimization (KTO), a human-centered loss function, applied iteratively over the snorkelai/Snorkel-Mistral-PairRM-DPO-Dataset.

Key Capabilities

Enhanced Instruction Following: Optimized through KTO, the model is designed for improved adherence to user instructions and preferences.
Conversational Proficiency: The training methodology, including alignment with a DPO dataset, contributes to its ability to engage in coherent and contextually relevant dialogues.
Competitive Performance: Achieved a verified score of 33.23 on the Alpaca Eval 2.0 Leaderboard, ranking #2 at the time of its release.

Training Methodology

The model underwent three iterations of KTO training, with each iteration using the previously trained model as a reference. This process aims to refine alignment and performance. Further details on KTO can be found in ContextualAI's code repository and blog post.

Prompting Format

Users should format prompts consistent with the TuluV2 style, using <|user|> and <|assistant|> roles, with the human speaking first. The tokenizer automatically adds a beginning-of-sequence (BOS) token.

Overview

Overview

Key Capabilities

Training Methodology

Prompting Format

Full Model Card (README)