Kyleyee/DPO_hh-seed2

TEXT GENERATIONConcurrency Cost:1Model Size:1.5BQuant:BF16Ctx Length:32kPublished:Apr 22, 2026Architecture:Transformer Cold

Kyleyee/DPO_hh-seed2 is a 1.5 billion parameter language model fine-tuned by Kyleyee using Direct Preference Optimization (DPO). It is based on Kyleyee/Qwen2.5-1.5B-sft-hh-3e and trained on the Kyleyee/train_data_Helpful_drdpo_preference dataset, offering a 32768-token context length. This model is optimized for generating helpful responses, leveraging preference data to align its outputs with desired human feedback.

Loading preview...

Model Overview

Kyleyee/DPO_hh-seed2 is a 1.5 billion parameter language model developed by Kyleyee. It is a fine-tuned iteration of the Kyleyee/Qwen2.5-1.5B-sft-hh-3e model, specifically optimized using Direct Preference Optimization (DPO).

Key Capabilities

  • Preference Alignment: The model has been trained with DPO, a method that leverages human preference data to improve the quality and helpfulness of generated text. This training was conducted on the Kyleyee/train_data_Helpful_drdpo_preference dataset.
  • Extended Context Window: It supports a substantial context length of 32768 tokens, allowing for processing and generating longer, more coherent responses.
  • Instruction Following: As a fine-tuned model, it is designed to follow instructions effectively, as demonstrated by its quick start example for text generation.

Training Details

This model's training procedure utilized the TRL library and implemented the DPO method, as detailed in the paper "Direct Preference Optimization: Your Language Model is Secretly a Reward Model". The training environment included TRL 0.16.0.dev0, Transformers 4.49.0, Pytorch 2.6.0+cu126, Datasets 3.3.2, and Tokenizers 0.21.0.

Good For

  • Helpful Response Generation: Ideal for applications requiring models to produce responses that are aligned with human preferences for helpfulness.
  • Instruction-based Tasks: Suitable for tasks where the model needs to generate text based on specific user prompts or instructions.