Kyleyee/CPO_hh-seed5
Kyleyee/CPO_hh-seed5 is a 1.5 billion parameter causal language model fine-tuned by Kyleyee. It is based on Kyleyee/Qwen2.5-1.5B-sft-hh-3e and further trained using Contrastive Preference Optimization (CPO) on a helpfulness preference dataset. This model is optimized for generating helpful and aligned responses, leveraging its 32K context length for nuanced interactions.
Loading preview...
Overview
Kyleyee/CPO_hh-seed5 is a 1.5 billion parameter language model developed by Kyleyee. It is a fine-tuned iteration of the Kyleyee/Qwen2.5-1.5B-sft-hh-3e base model, specifically enhanced through a training process known as Contrastive Preference Optimization (CPO). This method, detailed in the paper "Contrastive Preference Optimization: Pushing the Boundaries of LLM Performance in Machine Translation," aims to improve the model's ability to generate preferred responses.
Key Capabilities
- Preference-Optimized Responses: Trained with CPO on a helpfulness preference dataset, indicating an optimization for generating more aligned and helpful outputs.
- Foundation Model: Built upon the Qwen2.5-1.5B architecture, providing a solid base for general language understanding and generation tasks.
- Extended Context Window: Features a context length of 32,768 tokens, allowing for processing and generating longer, more complex interactions.
Training Details
The model was trained using the TRL (Transformer Reinforcement Learning) library, leveraging the Kyleyee/train_data_Helpful_drdpo_preference dataset. This specific training approach focuses on aligning the model's outputs with human preferences for helpfulness.