Name: UCLA-AGI/Mistral7B-PairRM-SPPO-Iter2 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: UCLA-AGI

Overview

UCLA-AGI/Mistral7B-PairRM-SPPO-Iter2 is a 7 billion parameter language model developed by UCLA-AGI, building upon the mistralai/Mistral-7B-Instruct-v0.2 architecture. This model is the second iteration in a series that employs Self-Play Preference Optimization (SPPO) for alignment, as detailed in the paper "Self-Play Preference Optimization for Language Model Alignment." It was fine-tuned using synthetic responses generated from the openbmb/UltraFeedback dataset, specifically a split from snorkelai/Snorkel-Mistral-PairRM-DPO-Dataset.

Key Capabilities & Differentiators

Self-Play Preference Optimization (SPPO): Leverages an iterative self-play mechanism to enhance model alignment and response quality, generating 5 responses per iteration (K=5).
Synthetic Data Training: Aligned exclusively on synthetic datasets, demonstrating the effectiveness of this approach for preference optimization.
Improved Alignment: Shows progressive improvements in alignment metrics across iterations, with Iteration 2 achieving a 27.62% Win Rate on AlpacaEval and an average MT-Bench score of 7.49.
Mistral-7B Base: Benefits from the strong foundational capabilities of the Mistral-7B-Instruct-v0.2 model.

Evaluation Highlights

AlpacaEval: Achieved a 27.62% Win Rate (32.12% with best-of-16 sampling) on AlpacaEval, indicating strong performance in instruction following and helpfulness.
MT-Bench: Scored an average of 7.49, reflecting good conversational abilities.
Open LLM Leaderboard: Maintained competitive performance across various academic benchmarks, with an average score of 66.75.

When to Use This Model

This model is particularly suitable for applications requiring a 7B parameter model with enhanced alignment and high-quality, instruction-following responses, especially in scenarios where synthetic data-driven alignment is a focus. It represents a specific iteration in the SPPO research, offering insights into the progression of alignment quality.

Overview

Overview

Key Capabilities & Differentiators

Evaluation Highlights

When to Use This Model

Full Model Card (README)