Kyleyee/ORPO_hh-seed3

TEXT GENERATIONConcurrency Cost:1Model Size:1.5BQuant:BF16Ctx Length:32kPublished:Apr 28, 2026Architecture:Transformer Cold

Kyleyee/ORPO_hh-seed3 is a 1.5 billion parameter language model fine-tuned by Kyleyee using the ORPO method. It is based on Kyleyee/Qwen2.5-1.5B-sft-hh-3e and trained on the Kyleyee/train_data_Helpful_drdpo_preference dataset, specializing in generating helpful responses. This model leverages a 32768 token context length and is optimized for preference alignment without a reference model.

Loading preview...

Model Overview

Kyleyee/ORPO_hh-seed3 is a 1.5 billion parameter language model developed by Kyleyee, fine-tuned from the base model Kyleyee/Qwen2.5-1.5B-sft-hh-3e. It utilizes a substantial context length of 32768 tokens, making it suitable for processing longer inputs and generating comprehensive outputs.

Key Capabilities & Training

This model's primary distinction lies in its training methodology: it was fine-tuned using ORPO (Monolithic Preference Optimization without Reference Model). This technique, detailed in the paper "ORPO: Monolithic Preference Optimization without Reference Model" (arXiv:2403.07691), allows for preference alignment directly, simplifying the optimization process. The training was conducted using the TRL framework on the specific dataset Kyleyee/train_data_Helpful_drdpo_preference, indicating an optimization for generating helpful and aligned responses.

Use Cases

Given its training on a helpful preference dataset and the ORPO method, this model is particularly well-suited for applications requiring:

  • Generating helpful and aligned text responses.
  • Tasks where preference optimization is crucial for output quality.
  • Scenarios benefiting from a model with a large context window for detailed interactions.