ContextualAI/Contextual_KTO_Mistral_PairRM

TEXT GENERATIONConcurrency Cost:1Model Size:7BQuant:FP8Ctx Length:8kPublished:Mar 5, 2024License:apache-2.0Architecture:Transformer0.0K Open Weights Cold

ContextualAI's Contextual_KTO_Mistral_PairRM is a 7 billion parameter instruction-tuned language model, built upon the Mistral-7B-Instruct-v0.2 architecture. It is optimized using the KTO (Kahneman-Tversky Optimization) loss function and aligned with the Snorkel-Mistral-PairRM-DPO-Dataset through three iterative KTO training passes. This model excels in instruction following and conversational tasks, achieving a verified score of 33.23 on the Alpaca Eval 2.0 Leaderboard.

Loading preview...

Overview

ContextualAI's Contextual_KTO_Mistral_PairRM is a 7 billion parameter language model derived from the mistralai/Mistral-7B-Instruct-v0.2 family. It leverages a novel alignment methodology involving Kahneman-Tversky Optimization (KTO), a human-centered loss function, applied iteratively over the snorkelai/Snorkel-Mistral-PairRM-DPO-Dataset.

Key Capabilities

  • Enhanced Instruction Following: Optimized through KTO, the model is designed for improved adherence to user instructions and preferences.
  • Conversational Proficiency: The training methodology, including alignment with a DPO dataset, contributes to its ability to engage in coherent and contextually relevant dialogues.
  • Competitive Performance: Achieved a verified score of 33.23 on the Alpaca Eval 2.0 Leaderboard, ranking #2 at the time of its release.

Training Methodology

The model underwent three iterations of KTO training, with each iteration using the previously trained model as a reference. This process aims to refine alignment and performance. Further details on KTO can be found in ContextualAI's code repository and blog post.

Prompting Format

Users should format prompts consistent with the TuluV2 style, using <|user|> and <|assistant|> roles, with the human speaking first. The tokenizer automatically adds a beginning-of-sequence (BOS) token.

Popular Sampler Settings

Top 3 parameter combinations used by Featherless users for this model. Click a tab to see each config.

temperature
top_p
top_k
frequency_penalty
presence_penalty
repetition_penalty
min_p