Kyleyee/cDPO_hh-seed5

TEXT GENERATIONConcurrency Cost:1Model Size:1.5BQuant:BF16Ctx Length:32kPublished:Apr 28, 2026Architecture:Transformer Cold

Kyleyee/cDPO_hh-seed5 is a 1.5 billion parameter causal language model fine-tuned by Kyleyee using Direct Preference Optimization (DPO). It is based on Kyleyee/Qwen2.5-1.5B-sft-hh-3e and trained on a helpfulness preference dataset, making it suitable for generating helpful and aligned responses. The model has a context length of 32768 tokens, enabling it to process extensive inputs for various text generation tasks.

Loading preview...

Model Overview

Kyleyee/cDPO_hh-seed5 is a 1.5 billion parameter language model developed by Kyleyee. It is a fine-tuned variant of the Kyleyee/Qwen2.5-1.5B-sft-hh-3e base model, specifically optimized using Direct Preference Optimization (DPO).

Key Characteristics

  • Base Model: Built upon Kyleyee/Qwen2.5-1.5B-sft-hh-3e.
  • Training Method: Utilizes Direct Preference Optimization (DPO), a method designed to align language models with human preferences by directly optimizing a policy against a reward model.
  • Dataset: Fine-tuned on the Kyleyee/train_data_Helpful_drdpo_preference dataset, indicating a focus on generating helpful and preferred responses.
  • Framework: Trained using the TRL (Transformer Reinforcement Learning) library.
  • Context Length: Supports a substantial context window of 32768 tokens.

Potential Use Cases

This model is particularly well-suited for applications requiring:

  • Helpful Response Generation: Its DPO training on a helpfulness dataset suggests strong performance in producing informative and user-preferred answers.
  • Instruction Following: Given its fine-tuning, it can be expected to follow instructions effectively for various text-based tasks.
  • General Text Generation: Capable of generating coherent and contextually relevant text across a wide range of prompts.