UCLA-AGI/Mistral7B-PairRM-SPPO-Iter2

Warm
Public
7B
FP8
4096
May 4, 2024
License: apache-2.0
Hugging Face
Overview

Overview

UCLA-AGI/Mistral7B-PairRM-SPPO-Iter2 is a 7 billion parameter language model developed by UCLA-AGI, building upon the mistralai/Mistral-7B-Instruct-v0.2 architecture. This model is the second iteration in a series that employs Self-Play Preference Optimization (SPPO) for alignment, as detailed in the paper "Self-Play Preference Optimization for Language Model Alignment." It was fine-tuned using synthetic responses generated from the openbmb/UltraFeedback dataset, specifically a split from snorkelai/Snorkel-Mistral-PairRM-DPO-Dataset.

Key Capabilities & Differentiators

  • Self-Play Preference Optimization (SPPO): Leverages an iterative self-play mechanism to enhance model alignment and response quality, generating 5 responses per iteration (K=5).
  • Synthetic Data Training: Aligned exclusively on synthetic datasets, demonstrating the effectiveness of this approach for preference optimization.
  • Improved Alignment: Shows progressive improvements in alignment metrics across iterations, with Iteration 2 achieving a 27.62% Win Rate on AlpacaEval and an average MT-Bench score of 7.49.
  • Mistral-7B Base: Benefits from the strong foundational capabilities of the Mistral-7B-Instruct-v0.2 model.

Evaluation Highlights

  • AlpacaEval: Achieved a 27.62% Win Rate (32.12% with best-of-16 sampling) on AlpacaEval, indicating strong performance in instruction following and helpfulness.
  • MT-Bench: Scored an average of 7.49, reflecting good conversational abilities.
  • Open LLM Leaderboard: Maintained competitive performance across various academic benchmarks, with an average score of 66.75.

When to Use This Model

This model is particularly suitable for applications requiring a 7B parameter model with enhanced alignment and high-quality, instruction-following responses, especially in scenarios where synthetic data-driven alignment is a focus. It represents a specific iteration in the SPPO research, offering insights into the progression of alignment quality.