Kyleyee/rDPO_hh-seed2

TEXT GENERATIONConcurrency Cost:1Model Size:1.5BQuant:BF16Ctx Length:32kPublished:Apr 28, 2026Architecture:Transformer Cold

Kyleyee/rDPO_hh-seed2 is a 1.5 billion parameter language model fine-tuned from Kyleyee/Qwen2.5-1.5B-sft-hh-3e with a 32768-token context length. This model was trained using Direct Preference Optimization (DPO) on a helpfulness-focused dataset. It is designed to generate more helpful and aligned responses, building upon its Qwen2.5 base architecture.

Loading preview...

Model Overview

Kyleyee/rDPO_hh-seed2 is a 1.5 billion parameter language model, fine-tuned from the Kyleyee/Qwen2.5-1.5B-sft-hh-3e base model. It leverages a substantial 32768-token context window, making it suitable for processing longer inputs and generating extended responses.

Key Characteristics

  • Direct Preference Optimization (DPO): The model was trained using the DPO method, which directly optimizes a language model to align with human preferences without requiring a separate reward model. This training approach aims to enhance the model's ability to generate helpful and preferred outputs.
  • Helpfulness Alignment: Fine-tuned on the Kyleyee/train_data_Helpful_drdpo_preference dataset, this model is specifically optimized for generating helpful responses.
  • TRL Framework: The training process utilized the TRL (Transformer Reinforcement Learning) library, a framework for training language models with reinforcement learning techniques.

Potential Use Cases

  • Helpful Assistant: Ideal for applications requiring a model that can provide informative and constructive answers.
  • Preference-Aligned Generation: Suitable for tasks where output quality is judged by human preferences, such as dialogue systems or content creation requiring specific helpfulness criteria.