Name: Kyleyee/DPO_hh-seed1 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: Kyleyee

Model Overview

Kyleyee/DPO_hh-seed1 is a 1.5 billion parameter language model, building upon the Kyleyee/Qwen2.5-1.5B-sft-hh-3e base model. Its primary distinction lies in its training methodology: it has been fine-tuned using Direct Preference Optimization (DPO). This technique, detailed in the paper "Direct Preference Optimization: Your Language Model is Secretly a Reward Model," aims to align the model's outputs more closely with human preferences by directly optimizing a policy against a reference model.

Key Capabilities

Preference Alignment: Optimized to generate responses that are preferred by humans, based on the training data.
Contextual Understanding: Supports a substantial context length of 32768 tokens, allowing it to process and generate text based on extensive input.
Instruction Following: As a fine-tuned model, it is capable of following instructions to generate relevant and coherent text.

Training Details

This model was trained on the Kyleyee/train_data_Helpful_drdpo_preference dataset using the TRL library. The DPO method directly optimizes the policy to maximize the likelihood of preferred responses over dispreferred ones, without requiring a separate reward model.

Good For

Applications requiring models that generate helpful and human-aligned responses.
Tasks where preference-based fine-tuning is crucial for output quality.
Developers looking for a smaller, DPO-tuned model with a large context window for efficient deployment.

Overview

Model Overview

Key Capabilities

Training Details

Good For

Full Model Card (README)