Name: Kyleyee/DPO_hh-seed4 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: Kyleyee

Model Overview

Kyleyee/DPO_hh-seed4 is a 1.5 billion parameter language model developed by Kyleyee, fine-tuned from the Kyleyee/Qwen2.5-1.5B-sft-hh-3e base model. Its primary distinction lies in its training methodology: it utilizes Direct Preference Optimization (DPO), a technique designed to align language models with human preferences without the need for a separate reward model.

Key Capabilities

Preference-aligned generation: Trained on the Kyleyee/train_data_Helpful_drdpo_preference dataset, this model is specifically optimized to produce helpful and preferred responses.
Conversational AI: Excels in generating coherent and contextually relevant text for interactive applications, as demonstrated by its quick start example.
Efficient fine-tuning: Leverages the TRL library for DPO training, indicating a focus on effective and scalable alignment techniques.

Training Details

The model was fine-tuned using the DPO method, as detailed in the paper "Direct Preference Optimization: Your Language Model is Secretly a Reward Model." This approach directly optimizes a policy to maximize the likelihood of preferred responses over dispreferred ones, making it effective for aligning model behavior with desired human feedback. The training utilized TRL, Transformers, Pytorch, Datasets, and Tokenizers libraries.

Overview

Model Overview

Key Capabilities

Training Details

Full Model Card (README)