Kyleyee/CPO_hh-seed2
Kyleyee/CPO_hh-seed2 is a 1.5 billion parameter language model developed by Kyleyee, fine-tuned from Qwen2.5-1.5B-sft-hh-3e. It utilizes Contrastive Preference Optimization (CPO) on a helpfulness preference dataset, making it particularly effective for generating helpful and aligned responses. With a context length of 32768 tokens, this model is optimized for tasks requiring nuanced understanding and generation based on user preferences.
Loading preview...
Model Overview
Kyleyee/CPO_hh-seed2 is a 1.5 billion parameter language model developed by Kyleyee, building upon the Qwen2.5-1.5B-sft-hh-3e base model. It has been specifically fine-tuned using the Kyleyee/train_data_Helpful_drdpo_preference dataset, focusing on generating helpful and aligned responses.
Key Capabilities
- Preference Optimization: This model was trained using Contrastive Preference Optimization (CPO), a method designed to enhance LLM performance by leveraging preference data. This approach, detailed in the paper "Contrastive Preference Optimization: Pushing the Boundaries of LLM Performance in Machine Translation," helps the model better align with desired output characteristics.
- Helpfulness Alignment: The fine-tuning on a dedicated helpfulness preference dataset aims to improve the model's ability to provide useful and relevant answers to user queries.
- Extended Context Window: With a context length of 32768 tokens, the model can process and generate longer, more coherent texts, maintaining context over extensive interactions.
Training Details
The model was trained using the TRL library, a framework for Transformer Reinforcement Learning. This training methodology, combined with the CPO technique, distinguishes its alignment process from standard supervised fine-tuning or other preference optimization methods.
Use Cases
This model is well-suited for applications requiring a small, efficient language model that can generate helpful and preference-aligned text, particularly in conversational AI or question-answering systems where response quality and alignment are critical.