Overview
UCLA-AGI/Gemma-2-9B-It-SPPO-Iter2 is a 9 billion parameter instruction-tuned language model developed by UCLA-AGI. It is based on the google/gemma-2-9b-it architecture and has undergone a specialized fine-tuning process called Self-Play Preference Optimization (SPPO) for alignment, specifically at its second iteration. The training utilized synthetic responses generated from the openbmb/UltraFeedback dataset, split across multiple iterations.
Key Characteristics
- Architecture: 9 billion parameter GPT-like model.
- Fine-tuning Method: Self-Play Preference Optimization (SPPO) for improved alignment.
- Training Data: Leverages synthetic datasets derived from UltraFeedback prompts.
- Language: Primarily English.
- Context Length: Supports a context length of 16384 tokens.
- License: Apache-2.0.
Performance Insights
While specific benchmark results for Iteration 2 are not directly provided in comparison to other models, the related SPPO iterations on Llama-3-8B show progressive improvements in AlpacaEval win rates, suggesting the SPPO method enhances model performance in instruction-following scenarios. For instance, Llama-3-8B-SPPO Iter2 achieved a 50.93% LC. Win Rate and 44.64% Win Rate on AlpacaEval.
Use Cases
This model is suitable for general instruction-following tasks where a robustly aligned model is beneficial. Its fine-tuning approach aims to produce more helpful and harmless outputs, making it a strong candidate for applications requiring high-quality conversational AI or content generation based on user prompts.