UCLA-AGI/Mistral7B-PairRM-SPPO-Iter2 is a 7 billion parameter GPT-like model developed by UCLA-AGI, fine-tuned from Mistral-7B-Instruct-v0.2. This model utilizes Self-Play Preference Optimization (SPPO) at its second iteration, specifically aligned using synthetic datasets derived from UltraFeedback. It is designed for improved alignment and response quality, as evidenced by its performance on various benchmarks including AlpacaEval and MT-Bench.
No reviews yet. Be the first to review!