Name: wxzhang/selective-pairrm-33045197-mt0 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: wxzhang

Model Overview

The wxzhang/selective-pairrm-33045197-mt0 is a 7 billion parameter language model derived from mistralai/Mistral-7B-Instruct-v0.2. It has been fine-tuned using Direct Preference Optimization (DPO) on the snorkelai/Snorkel-Mistral-PairRM-DPO-Dataset.

Key Characteristics

Base Model: Mistral-7B-Instruct-v0.2
Fine-tuning Method: Direct Preference Optimization (DPO)
Training Dataset: snorkelai/Snorkel-Mistral-PairRM-DPO-Dataset
Evaluation Performance: Achieved a rewards accuracy of 0.6055 on the evaluation set, with a final loss of 0.6825.

Training Details

The model was trained for 1 epoch with a learning rate of 5e-07, a total training batch size of 64, and an Adam optimizer. The training process involved 4 devices with a gradient accumulation of 4 steps.

Potential Use Cases

This model is particularly suited for applications where generating responses that align with specific preferences or human feedback is crucial. Its DPO fine-tuning suggests an ability to differentiate between preferred and rejected outputs, making it valuable for tasks like:

Preference-aligned text generation
Response ranking and selection
Dialogue systems requiring nuanced output

Overview

Model Overview

Key Characteristics

Training Details

Potential Use Cases

Full Model Card (README)