Model Overview
BAAI/Gemma2-9B-IT-Simpo-Infinity-Preference is a 9 billion parameter instruction-tuned language model built upon Google's Gemma-2-9B-IT architecture. It has been specifically fine-tuned using the Infinity-Preference dataset and the Simpo (Simple Preference Optimization) method to enhance its conversational capabilities and alignment with human preferences.
Key Capabilities & Performance
- Preference Alignment: Optimized for generating responses that align with human preferences through Simpo fine-tuning.
- Strong Conversational Performance: Achieves a notable 73.4% LC win-rate on AlpacaEval 2.0.
- Competitive Benchmarking: Demonstrates a 58.1% win-rate on Arena-Hard against GPT-4, indicating robust performance in challenging conversational scenarios.
- Context Length: Supports a context length of 16384 tokens, allowing for more extensive interactions.
Training Details
The model was trained with specific hyperparameters including a learning rate of 8.0e-7, a batch size of 128, and a maximum sequence length of 2048 tokens over 1 epoch.
Good For
- Chatbots and Conversational AI: Its strong preference alignment and conversational win-rates make it well-suited for developing interactive chat applications.
- Instruction Following: Designed to accurately follow user instructions due to its instruction-tuned base and preference optimization.
Limitations
It is important to note that this model is restricted to academic research purposes only and cannot be used for commercial applications. The accuracy of the output is not guaranteed, and the project disclaims legal liability for model output or losses incurred from its use.