Name: UCLA-AGI/Gemma-2-9B-It-SPPO-Iter1 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: UCLA-AGI

Overview

UCLA-AGI/Gemma-2-9B-It-SPPO-Iter1 is a 9 billion parameter language model, specifically the first iteration of a Gemma-2-9B-It base model fine-tuned using Self-Play Preference Optimization (SPPO). Developed by UCLA-AGI, this model leverages synthetic datasets derived from the UltraFeedback dataset for its alignment process. The training utilized a single epoch with a learning rate of 5e-07 and an RMSProp optimizer.

Key Characteristics

Base Model: Fine-tuned from google/gemma-2-9b-it.
Alignment Method: Employs Self-Play Preference Optimization (SPPO) as described in the paper "Self-Play Preference Optimization for Language Model Alignment" (arXiv:2405.00675).
Training Data: Utilizes synthetic responses generated from prompt sets within the openbmb/UltraFeedback dataset, split into three parts for iterative training.
Language: Primarily English.
License: Apache-2.0.

Performance Insights

While this is the first iteration, subsequent iterations (Iter2 and Iter3) of the SPPO process show progressive improvements in metrics like LC. Win Rate and overall Win Rate on the AlpacaEval Leaderboard, suggesting the effectiveness of the SPPO method. For instance, the Iter3 model achieves a 53.27% LC. Win Rate and 47.74% Win Rate on AlpacaEval, indicating the potential for further improvement in this model series.

Overview

Overview

Key Characteristics

Performance Insights

Full Model Card (README)