snorkelai/Snorkel-Mistral-PairRM-DPO
TEXT GENERATIONConcurrency Cost:1Model Size:7BQuant:FP8Ctx Length:8kPublished:Jan 19, 2024License:apache-2.0Architecture:Transformer0.1K Open Weights Cold

Snorkel-Mistral-PairRM-DPO is a 7 billion parameter instruction-tuned causal language model developed by Snorkel AI, based on Mistral-7B-Instruct-v0.2. It is aligned using an iterative Direct Preference Optimization (DPO) methodology with a PairRM reward model, achieving a 30.22 score on Alpaca-Eval 2.0, ranking highest among open-source base models at publication. This model is optimized for chat purposes and general instruction following, demonstrating a programmatic approach to LLM alignment.

Loading preview...