UCLA-AGI/Mistral7B-PairRM-SPPO
TEXT GENERATIONConcurrency Cost:1Model Size:7BQuant:FP8Ctx Length:4kPublished:May 4, 2024License:apache-2.0Architecture:Transformer0.0K Open Weights Cold

UCLA-AGI/Mistral7B-PairRM-SPPO is a 7 billion parameter GPT-like language model developed by UCLA-AGI, fine-tuned from Mistral-7B-Instruct-v0.2. It utilizes Self-Play Preference Optimization (SPPO) on synthetic datasets derived from UltraFeedback, focusing on alignment. This model demonstrates improved performance on AlpacaEval 2.0 by estimating soft probabilities using three samples during training, making it suitable for tasks requiring refined conversational alignment.

Loading preview...