Kyleyee/cDPO_hh-seed2

TEXT GENERATIONConcurrency Cost:1Model Size:1.5BQuant:BF16Ctx Length:32kPublished:Apr 28, 2026Architecture:Transformer Cold

Kyleyee/cDPO_hh-seed2 is a 1.5 billion parameter language model fine-tuned by Kyleyee, based on the Qwen2.5-1.5B-sft-hh-3e architecture. This model specializes in generating helpful and harmless responses, having been trained using Direct Preference Optimization (DPO) on a preference dataset. With a context length of 32768 tokens, it is optimized for conversational AI applications requiring nuanced and aligned outputs.

Loading preview...

Model Overview

Kyleyee/cDPO_hh-seed2 is a 1.5 billion parameter language model developed by Kyleyee. It is a fine-tuned variant of the Qwen2.5-1.5B-sft-hh-3e base model, specifically optimized for generating helpful and harmless text.

Key Capabilities

  • Preference-based Alignment: The model has been trained using Direct Preference Optimization (DPO), a method that leverages human preferences to align model outputs with desired behaviors (helpfulness and harmlessness).
  • Conversational AI: Its training on a helpfulness and harmlessness preference dataset makes it suitable for dialogue systems and chatbots where aligned and safe responses are critical.
  • Extended Context Window: Supports a context length of 32768 tokens, allowing for more extensive and coherent conversations or document processing.

Training Details

The model was fine-tuned on the Kyleyee/train_data_Helpful_drdpo_preference dataset using the TRL (Transformer Reinforcement Learning) library. The DPO method, introduced in the paper "Direct Preference Optimization: Your Language Model is Secretly a Reward Model," was central to its training process.

Use Cases

This model is particularly well-suited for applications requiring:

  • Safe and Aligned Chatbots: Generating responses that adhere to helpful and harmless guidelines.
  • Content Moderation Assistance: Aiding in the creation of appropriate and non-toxic content.
  • General Purpose Text Generation: Producing coherent and contextually relevant text with an emphasis on beneficial outputs.