Name: AALF/gemma-2-27b-it-SimPO-37K API
Brand: Featherless.ai
Price: 25.00 USD
Availability: InStock
Author: AALF

Model Overview

AALF/gemma-2-27b-it-SimPO-37K is a specialized large language model derived from the google/gemma-2-27b-it base model. It has undergone further fine-tuning using the SimPO (Simple Preference Optimization) framework, which leverages a reference-free reward mechanism to enhance model performance based on preference data.

Training Methodology

The fine-tuning process involved applying On-Policy Preference Data Generation on the HuggingFaceH4/ultrafeedback_binarized dataset. The RLHFlow/ArmoRM-Llama3-8B-v0.1 reward model was used to annotate responses, and prompts where the chosen response had a significantly higher reward than the rejected response were selected, resulting in 37,040 training data points. Training was conducted on 8x80G A800 GPUs using deepspeed_zero_stage3 with optimizer offloading to the CPU, ensuring efficient resource utilization.

Key Characteristics

Preference Optimization: Utilizes the SimPO framework for alignment with human preferences, aiming for higher quality and more helpful responses.
Data-Driven Refinement: Benefits from a curated dataset of 37,040 high-quality preference examples derived from UltraFeedback.
Gemma 2 27B Base: Built upon the robust architecture of Google's Gemma 2 27B instruction-tuned model.

Potential Use Cases

This model is particularly well-suited for applications requiring:

High-quality instruction following and conversational AI.
Improved response generation in dialogue systems and chatbots.
Tasks where alignment with human preferences is critical for user satisfaction.

Overview

Model Overview

Training Methodology

Key Characteristics

Potential Use Cases

Full Model Card (README)