Kyleyee/ORPO_hh-seed2

TEXT GENERATIONConcurrency Cost:1Model Size:1.5BQuant:BF16Ctx Length:32kPublished:Apr 28, 2026Architecture:Transformer Cold

Kyleyee/ORPO_hh-seed2 is a 1.5 billion parameter causal language model, fine-tuned from Kyleyee/Qwen2.5-1.5B-sft-hh-3e using the ORPO preference optimization method. This model specializes in generating helpful and aligned responses, trained on the Kyleyee/train_data_Helpful_drdpo_preference dataset. It is designed for tasks requiring nuanced and preference-aligned text generation, offering a context length of 32768 tokens.

Loading preview...

Overview

Kyleyee/ORPO_hh-seed2 is a 1.5 billion parameter language model developed by Kyleyee, fine-tuned from the Qwen2.5-1.5B-sft-hh-3e base model. It leverages the ORPO (Monolithic Preference Optimization without Reference Model) method, a technique designed to align models with human preferences without requiring a separate reference model. The training utilized the Kyleyee/train_data_Helpful_drdpo_preference dataset and the TRL framework.

Key Capabilities

  • Preference-aligned generation: Optimized to produce responses that are helpful and aligned with specified preferences.
  • Efficient fine-tuning: Employs the ORPO method, which simplifies the preference optimization process.
  • Causal language modeling: Capable of generating coherent and contextually relevant text based on prompts.

Good for

  • Instruction following: Generating responses that adhere to user instructions and preferences.
  • Dialogue systems: Creating more helpful and aligned conversational AI outputs.
  • Research in preference optimization: Exploring the application and effectiveness of the ORPO method in smaller models.