Name: princeton-nlp/gemma-2-9b-it-SimPO API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: princeton-nlp

Model Overview

The princeton-nlp/gemma-2-9b-it-SimPO is a 9 billion parameter causal language model developed by Yu Meng, Mengzhou Xia, and Danqi Chen. It is fine-tuned from the google/gemma-2-9b-it base model using the SimPO (Simple Preference Optimization) algorithm. SimPO is an offline preference optimization technique that improves model training by aligning the reward function with the generation likelihood, eliminating the need for a separate reference model and incorporating a target reward margin.

Key Capabilities & Differentiators

Preference Optimization: Leverages the novel SimPO algorithm for effective preference-based fine-tuning.
Reference-Free Training: SimPO's design removes the dependency on a reference model, simplifying the training process.
Enhanced Performance: Evaluation results show improvements over the base gemma-2-9b-it model and competitive performance against a DPO-tuned variant on various benchmarks, particularly in areas like AE2 LC and AH.
Efficient Fine-tuning: The model was fine-tuned on 8xH100 GPUs in approximately 100 minutes using the princeton-nlp/gemma2-ultrafeedback-armorm dataset.

Ideal Use Cases

This model is well-suited for applications requiring language generation that is highly aligned with human preferences, especially in scenarios where preference optimization is critical. Its SimPO-based training makes it a strong candidate for tasks demanding nuanced and contextually appropriate responses, offering an alternative to traditional DPO methods.

Overview

Model Overview

Key Capabilities & Differentiators

Ideal Use Cases

Full Model Card (README)