Name: AALF/gemma-2-27b-it-SimPO-37K-100steps API
Brand: Featherless.ai
Price: 25.00 USD
Availability: InStock
Author: AALF

Model Overview

AALF/gemma-2-27b-it-SimPO-37K-100steps is an instruction-tuned variant of the 27 billion parameter Google Gemma-2 model. It represents a 100-step checkpoint from a larger fine-tuning process, focusing on generating high-quality, preferred responses.

Key Capabilities & Training

Preference Optimization: The model was fine-tuned using the SimPO framework, which involves on-policy preference data generation.
Data Curation: Training data was generated from the HuggingFaceH4/ultrafeedback_binarized dataset, with responses evaluated by the RLHFlow/ArmoRM-Llama3-8B-v0.1 reward model. Only prompts where the chosen response's reward was significantly higher than the rejected response's reward were selected, resulting in 37,040 training examples.
Performance: Achieves a 77.09% WinRate and a 79.16% LC WinRate on the AlpacaEval2.0 benchmark, indicating strong performance in generating preferred outputs.
Technical Implementation: Training utilized 8x80G A800 GPUs with deepspeed_zero_stage3 and optimizer offloading to the CPU, leveraging the alignment-handbook library.

Good for

Applications requiring models optimized for generating high-quality, human-preferred responses.
Tasks where robust instruction following and preference alignment are critical.
Researchers and developers interested in models fine-tuned with the SimPO method.