Name: Kyleyee/CPO_hh-seed2 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: Kyleyee

Model Overview

Kyleyee/CPO_hh-seed2 is a 1.5 billion parameter language model developed by Kyleyee, building upon the Qwen2.5-1.5B-sft-hh-3e base model. It has been specifically fine-tuned using the Kyleyee/train_data_Helpful_drdpo_preference dataset, focusing on generating helpful and aligned responses.

Key Capabilities

Preference Optimization: This model was trained using Contrastive Preference Optimization (CPO), a method designed to enhance LLM performance by leveraging preference data. This approach, detailed in the paper "Contrastive Preference Optimization: Pushing the Boundaries of LLM Performance in Machine Translation," helps the model better align with desired output characteristics.
Helpfulness Alignment: The fine-tuning on a dedicated helpfulness preference dataset aims to improve the model's ability to provide useful and relevant answers to user queries.
Extended Context Window: With a context length of 32768 tokens, the model can process and generate longer, more coherent texts, maintaining context over extensive interactions.

Training Details

The model was trained using the TRL library, a framework for Transformer Reinforcement Learning. This training methodology, combined with the CPO technique, distinguishes its alignment process from standard supervised fine-tuning or other preference optimization methods.

Use Cases

This model is well-suited for applications requiring a small, efficient language model that can generate helpful and preference-aligned text, particularly in conversational AI or question-answering systems where response quality and alignment are critical.

Overview

Model Overview

Key Capabilities

Training Details

Use Cases

Full Model Card (README)