Kyleyee/ORPO_hh-seed2
Kyleyee/ORPO_hh-seed2 is a 1.5 billion parameter causal language model, fine-tuned from Kyleyee/Qwen2.5-1.5B-sft-hh-3e using the ORPO preference optimization method. This model specializes in generating helpful and aligned responses, trained on the Kyleyee/train_data_Helpful_drdpo_preference dataset. It is designed for tasks requiring nuanced and preference-aligned text generation, offering a context length of 32768 tokens.
Loading preview...
Overview
Kyleyee/ORPO_hh-seed2 is a 1.5 billion parameter language model developed by Kyleyee, fine-tuned from the Qwen2.5-1.5B-sft-hh-3e base model. It leverages the ORPO (Monolithic Preference Optimization without Reference Model) method, a technique designed to align models with human preferences without requiring a separate reference model. The training utilized the Kyleyee/train_data_Helpful_drdpo_preference dataset and the TRL framework.
Key Capabilities
- Preference-aligned generation: Optimized to produce responses that are helpful and aligned with specified preferences.
- Efficient fine-tuning: Employs the ORPO method, which simplifies the preference optimization process.
- Causal language modeling: Capable of generating coherent and contextually relevant text based on prompts.
Good for
- Instruction following: Generating responses that adhere to user instructions and preferences.
- Dialogue systems: Creating more helpful and aligned conversational AI outputs.
- Research in preference optimization: Exploring the application and effectiveness of the ORPO method in smaller models.