Name: Kyleyee/CPO_hh-seed3 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: Kyleyee

Model Overview

Kyleyee/CPO_hh-seed3 is a 1.5 billion parameter language model developed by Kyleyee, fine-tuned from the Kyleyee/Qwen2.5-1.5B-sft-hh-3e base model. This model leverages the TRL library for its training process.

Key Capabilities

Helpful Response Generation: The model is specifically optimized for generating helpful and nuanced text, having been fine-tuned on the Kyleyee/train_data_Helpful_drdpo_preference dataset.
Contrastive Preference Optimization (CPO): It utilizes the CPO training method, as introduced in the paper "Contrastive Preference Optimization: Pushing the Boundaries of LLM Performance in Machine Translation" (ICML 2024). This method aims to enhance LLM performance by incorporating contrastive learning during preference optimization.

Intended Use Cases

This model is well-suited for applications requiring a language model that can produce high-quality, helpful, and contextually appropriate responses, particularly in conversational AI or assistant-like roles where helpfulness is a key metric.

Overview

Model Overview

Key Capabilities

Intended Use Cases

Full Model Card (README)