Kyleyee/HINGE_hh-seed3

TEXT GENERATIONConcurrency Cost:1Model Size:1.5BQuant:BF16Ctx Length:32kPublished:Apr 28, 2026Architecture:Transformer Cold

Kyleyee/HINGE_hh-seed3 is a 1.5 billion parameter language model fine-tuned from Kyleyee/Qwen2.5-1.5B-sft-hh-3e. It was trained using Direct Preference Optimization (DPO) on a helpfulness preference dataset, making it suitable for generating helpful and aligned responses. With a context length of 32768 tokens, this model is designed for conversational AI applications requiring preference-aligned text generation.

Loading preview...

Overview

Kyleyee/HINGE_hh-seed3 is a 1.5 billion parameter language model developed by Kyleyee. It is a fine-tuned variant of the Qwen2.5-1.5B-sft-hh-3e base model, specifically optimized for generating helpful responses. The model leverages a substantial 32768 token context length, enabling it to process and generate longer, more coherent texts.

Key Capabilities

  • Preference-Aligned Generation: The model was trained using Direct Preference Optimization (DPO), a method that aligns language model outputs with human preferences, specifically for helpfulness.
  • Foundation Model: Built upon the Qwen2.5-1.5B-sft-hh-3e architecture, providing a robust base for language understanding and generation.
  • Extended Context Window: Supports a context length of 32768 tokens, beneficial for tasks requiring extensive conversational history or detailed input.

Training Details

The model was fine-tuned using the TRL library, a Transformer Reinforcement Learning framework. The training utilized the Kyleyee/train_data_Helpful_drdpo_preference dataset, which is designed for preference-based learning. The DPO method, as introduced in the paper "Direct Preference Optimization: Your Language Model is Secretly a Reward Model," was central to its training process.

Good For

  • Developing conversational agents that prioritize helpful and aligned responses.
  • Applications requiring text generation where human preferences for helpfulness are critical.
  • Research into preference-based fine-tuning methods for smaller language models.