Kyleyee/CPO_hh-seed3

TEXT GENERATIONConcurrency Cost:1Model Size:1.5BQuant:BF16Ctx Length:32kPublished:Apr 28, 2026Architecture:Transformer Cold

Kyleyee/CPO_hh-seed3 is a 1.5 billion parameter language model fine-tuned by Kyleyee using Contrastive Preference Optimization (CPO). Based on Kyleyee/Qwen2.5-1.5B-sft-hh-3e, this model is specifically optimized for generating helpful responses, leveraging a preference dataset for training. It is designed for tasks requiring nuanced and helpful text generation, building on the Qwen2.5 architecture.

Loading preview...

Model Overview

Kyleyee/CPO_hh-seed3 is a 1.5 billion parameter language model developed by Kyleyee, fine-tuned from the Kyleyee/Qwen2.5-1.5B-sft-hh-3e base model. This model leverages the TRL library for its training process.

Key Capabilities

  • Helpful Response Generation: The model is specifically optimized for generating helpful and nuanced text, having been fine-tuned on the Kyleyee/train_data_Helpful_drdpo_preference dataset.
  • Contrastive Preference Optimization (CPO): It utilizes the CPO training method, as introduced in the paper "Contrastive Preference Optimization: Pushing the Boundaries of LLM Performance in Machine Translation" (ICML 2024). This method aims to enhance LLM performance by incorporating contrastive learning during preference optimization.

Intended Use Cases

This model is well-suited for applications requiring a language model that can produce high-quality, helpful, and contextually appropriate responses, particularly in conversational AI or assistant-like roles where helpfulness is a key metric.