Kyleyee/rDPO_hh-seed5

TEXT GENERATIONConcurrency Cost:1Model Size:1.5BQuant:BF16Ctx Length:32kPublished:Apr 28, 2026Architecture:Transformer Cold

Kyleyee/rDPO_hh-seed5 is a 1.5 billion parameter language model fine-tuned from Kyleyee/Qwen2.5-1.5B-sft-hh-3e with a 32768 token context length. This model was trained using Direct Preference Optimization (DPO) on a helpfulness preference dataset. It is designed to generate more helpful and aligned responses, making it suitable for conversational AI and instruction-following tasks.

Loading preview...

Model Overview

Kyleyee/rDPO_hh-seed5 is a 1.5 billion parameter language model, building upon the base of Kyleyee/Qwen2.5-1.5B-sft-hh-3e. It features a substantial context length of 32768 tokens, allowing for processing longer inputs and generating more coherent, extended responses.

Training Methodology

This model was fine-tuned using Direct Preference Optimization (DPO), a method introduced in the paper "Direct Preference Optimization: Your Language Model is Secretly a Reward Model." The training utilized the Kyleyee/train_data_Helpful_drdpo_preference dataset, specifically designed to enhance the model's helpfulness and alignment. The training process was implemented using the TRL library.

Key Characteristics

  • Parameter Count: 1.5 billion parameters.
  • Context Length: 32768 tokens.
  • Optimization: Fine-tuned with DPO for improved helpfulness and preference alignment.
  • Base Model: Derived from Kyleyee/Qwen2.5-1.5B-sft-hh-3e.

Use Cases

This model is particularly well-suited for applications requiring:

  • Helpful and aligned responses: Its DPO training on a helpfulness dataset makes it effective for generating user-friendly and constructive outputs.
  • Conversational AI: Capable of engaging in extended dialogues due to its large context window.
  • Instruction following: Designed to better understand and execute user instructions.