Name: Kyleyee/rDPO_hh-seed3 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: Kyleyee

Model Overview

Kyleyee/rDPO_hh-seed3 is a 1.5 billion parameter language model developed by Kyleyee, building upon the base of Kyleyee/Qwen2.5-1.5B-sft-hh-3e. This model has been specifically fine-tuned using Direct Preference Optimization (DPO), a method that aligns language models with human preferences by treating the preference data as implicit rewards.

Key Capabilities

Preference-Aligned Responses: Optimized using DPO on a helpfulness dataset, making it adept at generating responses that are aligned with desired conversational qualities.
Efficient Fine-tuning: Leverages the TRL library for efficient training, demonstrating the application of advanced alignment techniques on smaller models.
Extended Context Window: Features a 32768 token context length, allowing for more extensive and coherent conversations or document processing.

Training Details

The model was trained on the Kyleyee/train_data_Helpful_drdpo_preference dataset. The DPO method, as introduced in "Direct Preference Optimization: Your Language Model is Secretly a Reward Model," was utilized to enhance the model's ability to produce helpful outputs. This approach allows the model to implicitly learn a reward function from preference pairs, guiding its generation towards preferred responses without explicit reward modeling.

Overview

Model Overview

Key Capabilities

Training Details

Full Model Card (README)