Name: Kyleyee/CPO_hh-seed4 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: Kyleyee

Model Overview

Kyleyee/CPO_hh-seed4 is a 1.5 billion parameter language model developed by Kyleyee, building upon the Kyleyee/Qwen2.5-1.5B-sft-hh-3e base model. It has been specifically fine-tuned using the Contrastive Preference Optimization (CPO) method, as detailed in the paper "Contrastive Preference Optimization: Pushing the Boundaries of LLM Performance in Machine Translation." The training utilized the Kyleyee/train_data_Helpful_drdpo_preference dataset and the TRL framework.

Key Capabilities

Preference-aligned Generation: Optimized for generating responses that align with human preferences, particularly for helpfulness, due to its CPO training on a preference dataset.
Foundation Model: Based on the Qwen2.5 architecture, providing a robust base for language understanding and generation tasks.
Extended Context Window: Supports a context length of 32768 tokens, allowing for processing and generating longer, more coherent texts.

Training Details

The model's unique characteristic stems from its training with CPO, a method designed to enhance LLM performance by leveraging contrastive learning on preference data. This approach aims to improve the model's ability to differentiate between preferred and non-preferred outputs, leading to more refined and aligned generations. The training was conducted using TRL (Transformer Reinforcement Learning) framework, with specific versions of TRL, Transformers, Pytorch, Datasets, and Tokenizers.

Use Cases

This model is particularly well-suited for applications where generating helpful, preference-aligned, and contextually rich text is crucial. Its CPO training makes it a strong candidate for tasks requiring nuanced understanding of user preferences and generating responses that reflect those preferences effectively.

Overview

Model Overview

Key Capabilities

Training Details

Use Cases

Full Model Card (README)