Name: Kyleyee/VRPO_hh-seed3 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: Kyleyee

Model Overview

Kyleyee/VRPO_hh-seed3 is a 1.5 billion parameter language model developed by Kyleyee, fine-tuned from the Kyleyee/Qwen2.5-1.5B-sft-hh-3e base model. It has a context length of 32768 tokens. The model's primary differentiation lies in its training methodology, utilizing DRDPO (Direct Preference Optimization) on a specific helpfulness preference dataset (Kyleyee/train_data_Helpful_drdpo_preference). This approach aims to align the model's outputs with human preferences for helpfulness.

Key Capabilities

Preference-aligned generation: Optimized to produce responses that are perceived as more helpful based on direct preference feedback.
Conversational AI: Suitable for tasks requiring engaging and helpful dialogue.
Efficient inference: With 1.5 billion parameters, it offers a balance between performance and computational efficiency.

Training Details

The model was trained using the TRL framework, specifically implementing the DRDPO method. DRDPO is a technique introduced in the paper "Direct Preference Optimization: Your Language Model is Secretly a Reward Model" (arXiv:2305.18290), which directly optimizes a language model using human preference data without requiring a separate reward model. This training paradigm enhances the model's ability to generate preferred responses.

Overview

Model Overview

Key Capabilities

Training Details

Full Model Card (README)