UCLA-AGI/Mistral7B-PairRM-SPPO-Iter3
TEXT GENERATIONConcurrency Cost:1Model Size:7BQuant:FP8Ctx Length:4kPublished:May 4, 2024License:apache-2.0Architecture:Transformer0.0K Open Weights Warm

UCLA-AGI/Mistral7B-PairRM-SPPO-Iter3 is a 7 billion parameter GPT-like model developed by UCLA-AGI, fine-tuned from Mistral-7B-Instruct-v0.2. It utilizes Self-Play Preference Optimization (SPPO) at its third iteration, trained on synthetic datasets derived from UltraFeedback. This model is specifically optimized for alignment, demonstrating improved win rates on benchmarks like AlpacaEval and Arena-Hard through iterative self-play.

Loading preview...