Name: Kyleyee/CPO_hh-seed5 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: Kyleyee

Overview

Kyleyee/CPO_hh-seed5 is a 1.5 billion parameter language model developed by Kyleyee. It is a fine-tuned iteration of the Kyleyee/Qwen2.5-1.5B-sft-hh-3e base model, specifically enhanced through a training process known as Contrastive Preference Optimization (CPO). This method, detailed in the paper "Contrastive Preference Optimization: Pushing the Boundaries of LLM Performance in Machine Translation," aims to improve the model's ability to generate preferred responses.

Key Capabilities

Preference-Optimized Responses: Trained with CPO on a helpfulness preference dataset, indicating an optimization for generating more aligned and helpful outputs.
Foundation Model: Built upon the Qwen2.5-1.5B architecture, providing a solid base for general language understanding and generation tasks.
Extended Context Window: Features a context length of 32,768 tokens, allowing for processing and generating longer, more complex interactions.

Training Details

The model was trained using the TRL (Transformer Reinforcement Learning) library, leveraging the Kyleyee/train_data_Helpful_drdpo_preference dataset. This specific training approach focuses on aligning the model's outputs with human preferences for helpfulness.

Overview

Overview

Key Capabilities

Training Details

Full Model Card (README)