Name: Kyleyee/DPO_hh-seed2 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: Kyleyee

Model Overview

Kyleyee/DPO_hh-seed2 is a 1.5 billion parameter language model developed by Kyleyee. It is a fine-tuned iteration of the Kyleyee/Qwen2.5-1.5B-sft-hh-3e model, specifically optimized using Direct Preference Optimization (DPO).

Key Capabilities

Preference Alignment: The model has been trained with DPO, a method that leverages human preference data to improve the quality and helpfulness of generated text. This training was conducted on the Kyleyee/train_data_Helpful_drdpo_preference dataset.
Extended Context Window: It supports a substantial context length of 32768 tokens, allowing for processing and generating longer, more coherent responses.
Instruction Following: As a fine-tuned model, it is designed to follow instructions effectively, as demonstrated by its quick start example for text generation.

Training Details

This model's training procedure utilized the TRL library and implemented the DPO method, as detailed in the paper "Direct Preference Optimization: Your Language Model is Secretly a Reward Model". The training environment included TRL 0.16.0.dev0, Transformers 4.49.0, Pytorch 2.6.0+cu126, Datasets 3.3.2, and Tokenizers 0.21.0.

Good For

Helpful Response Generation: Ideal for applications requiring models to produce responses that are aligned with human preferences for helpfulness.
Instruction-based Tasks: Suitable for tasks where the model needs to generate text based on specific user prompts or instructions.

Overview

Model Overview

Key Capabilities

Training Details

Good For

Full Model Card (README)