Kyleyee/cDPO_hh-seed5
TEXT GENERATIONConcurrency Cost:1Model Size:1.5BQuant:BF16Ctx Length:32kPublished:Apr 28, 2026Architecture:Transformer Cold
Kyleyee/cDPO_hh-seed5 is a 1.5 billion parameter causal language model fine-tuned by Kyleyee using Direct Preference Optimization (DPO). It is based on Kyleyee/Qwen2.5-1.5B-sft-hh-3e and trained on a helpfulness preference dataset, making it suitable for generating helpful and aligned responses. The model has a context length of 32768 tokens, enabling it to process extensive inputs for various text generation tasks.
Loading preview...
Model Overview
Kyleyee/cDPO_hh-seed5 is a 1.5 billion parameter language model developed by Kyleyee. It is a fine-tuned variant of the Kyleyee/Qwen2.5-1.5B-sft-hh-3e base model, specifically optimized using Direct Preference Optimization (DPO).
Key Characteristics
- Base Model: Built upon
Kyleyee/Qwen2.5-1.5B-sft-hh-3e. - Training Method: Utilizes Direct Preference Optimization (DPO), a method designed to align language models with human preferences by directly optimizing a policy against a reward model.
- Dataset: Fine-tuned on the
Kyleyee/train_data_Helpful_drdpo_preferencedataset, indicating a focus on generating helpful and preferred responses. - Framework: Trained using the
TRL(Transformer Reinforcement Learning) library. - Context Length: Supports a substantial context window of 32768 tokens.
Potential Use Cases
This model is particularly well-suited for applications requiring:
- Helpful Response Generation: Its DPO training on a helpfulness dataset suggests strong performance in producing informative and user-preferred answers.
- Instruction Following: Given its fine-tuning, it can be expected to follow instructions effectively for various text-based tasks.
- General Text Generation: Capable of generating coherent and contextually relevant text across a wide range of prompts.