Name: jack009064/Affine-mmh2-5EptJ5DkkearraPC65QFsPbkHkB1BZnNfoeJ5iLKeNXJGUR2 API
Brand: Featherless.ai
Price: 25.00 USD
Availability: InStock
Author: jack009064

Model Overview

The jack009064/Affine-mmh2-5EptJ5DkkearraPC65QFsPbkHkB1BZnNfoeJ5iLKeNXJGUR2 is a 32 billion parameter language model that has been fine-tuned using Direct Preference Optimization (DPO). This method, detailed in the paper "Direct Preference Optimization: Your Language Model is Secretly a Reward Model," aims to align the model's outputs more closely with human preferences without the need for a separate reward model.

Key Characteristics

DPO Fine-tuning: Utilizes the Direct Preference Optimization technique for enhanced response quality and alignment.
TRL Framework: Trained using the TRL (Transformers Reinforcement Learning) library, a robust framework for training large language models.
General Purpose: Suitable for a variety of text generation tasks, particularly those benefiting from preference-aligned outputs.

Training Details

The model's training procedure involved DPO, with specific framework versions:

TRL: 0.29.1
Transformers: 5.3.0
Pytorch: 2.6.0+cu124
Datasets: 4.8.4
Tokenizers: 0.22.2

This model is a good candidate for applications where generating human-preferred and coherent text is crucial, leveraging the benefits of DPO for improved conversational flow and relevance.

Overview

Model Overview

Key Characteristics

Training Details

Full Model Card (README)