UCLA-AGI/Gemma-2-9B-It-SPPO-Iter2

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:9BQuant:FP8Ctx Length:16kPublished:Jun 29, 2024License:gemmaArchitecture:Transformer0.0K Warm

The UCLA-AGI/Gemma-2-9B-It-SPPO-Iter2 is a 9 billion parameter GPT-like model, fine-tuned from google/gemma-2-9b-it using Self-Play Preference Optimization (SPPO) at its second iteration. This model leverages synthetic datasets derived from UltraFeedback prompts to enhance alignment. It is primarily English-language focused and designed for general instruction-following tasks, offering a 16384 token context length.

Loading preview...

Overview

UCLA-AGI/Gemma-2-9B-It-SPPO-Iter2 is a 9 billion parameter instruction-tuned language model developed by UCLA-AGI. It is based on the google/gemma-2-9b-it architecture and has undergone a specialized fine-tuning process called Self-Play Preference Optimization (SPPO) for alignment, specifically at its second iteration. The training utilized synthetic responses generated from the openbmb/UltraFeedback dataset, split across multiple iterations.

Key Characteristics

  • Architecture: 9 billion parameter GPT-like model.
  • Fine-tuning Method: Self-Play Preference Optimization (SPPO) for improved alignment.
  • Training Data: Leverages synthetic datasets derived from UltraFeedback prompts.
  • Language: Primarily English.
  • Context Length: Supports a context length of 16384 tokens.
  • License: Apache-2.0.

Performance Insights

While specific benchmark results for Iteration 2 are not directly provided in comparison to other models, the related SPPO iterations on Llama-3-8B show progressive improvements in AlpacaEval win rates, suggesting the SPPO method enhances model performance in instruction-following scenarios. For instance, Llama-3-8B-SPPO Iter2 achieved a 50.93% LC. Win Rate and 44.64% Win Rate on AlpacaEval.

Use Cases

This model is suitable for general instruction-following tasks where a robustly aligned model is beneficial. Its fine-tuning approach aims to produce more helpful and harmless outputs, making it a strong candidate for applications requiring high-quality conversational AI or content generation based on user prompts.