Kyleyee/ORPO_hh-seed3
Kyleyee/ORPO_hh-seed3 is a 1.5 billion parameter language model fine-tuned by Kyleyee using the ORPO method. It is based on Kyleyee/Qwen2.5-1.5B-sft-hh-3e and trained on the Kyleyee/train_data_Helpful_drdpo_preference dataset, specializing in generating helpful responses. This model leverages a 32768 token context length and is optimized for preference alignment without a reference model.
Loading preview...
Model Overview
Kyleyee/ORPO_hh-seed3 is a 1.5 billion parameter language model developed by Kyleyee, fine-tuned from the base model Kyleyee/Qwen2.5-1.5B-sft-hh-3e. It utilizes a substantial context length of 32768 tokens, making it suitable for processing longer inputs and generating comprehensive outputs.
Key Capabilities & Training
This model's primary distinction lies in its training methodology: it was fine-tuned using ORPO (Monolithic Preference Optimization without Reference Model). This technique, detailed in the paper "ORPO: Monolithic Preference Optimization without Reference Model" (arXiv:2403.07691), allows for preference alignment directly, simplifying the optimization process. The training was conducted using the TRL framework on the specific dataset Kyleyee/train_data_Helpful_drdpo_preference, indicating an optimization for generating helpful and aligned responses.
Use Cases
Given its training on a helpful preference dataset and the ORPO method, this model is particularly well-suited for applications requiring:
- Generating helpful and aligned text responses.
- Tasks where preference optimization is crucial for output quality.
- Scenarios benefiting from a model with a large context window for detailed interactions.