Kyleyee/HINGE_hh-seed2

TEXT GENERATIONConcurrency Cost:1Model Size:1.5BQuant:BF16Ctx Length:32kPublished:Apr 28, 2026Architecture:Transformer Cold

Kyleyee/HINGE_hh-seed2 is a 1.5 billion parameter language model fine-tuned by Kyleyee, based on Qwen2.5-1.5B-sft-hh-3e. This model specializes in generating helpful responses, having been trained using Direct Preference Optimization (DPO) on a preference dataset. It is optimized for conversational AI tasks requiring helpful and aligned outputs, offering a compact yet capable solution for generating human-like text.

Loading preview...

Overview

Kyleyee/HINGE_hh-seed2 is a 1.5 billion parameter language model developed by Kyleyee. It is a fine-tuned variant of the Qwen2.5-1.5B-sft-hh-3e model, specifically optimized for generating helpful and aligned responses. The model's training leveraged the Kyleyee/train_data_Helpful_drdpo_preference dataset and utilized the Direct Preference Optimization (DPO) method, as introduced in the paper "Direct Preference Optimization: Your Language Model is Secretly a Reward Model" (arXiv:2305.18290). This approach aims to align the model's outputs with human preferences for helpfulness.

Key Capabilities

  • Helpful Response Generation: Excels at producing answers that are aligned with user preferences for helpfulness.
  • DPO Fine-tuning: Benefits from Direct Preference Optimization, a method for training language models from human preferences without explicit reward modeling.
  • Compact Size: At 1.5 billion parameters, it offers a balance between performance and computational efficiency.
  • Extended Context Window: Supports a context length of 32768 tokens, allowing for processing longer inputs and generating more extensive responses.

Good For

  • Conversational AI: Ideal for chatbots and virtual assistants where helpful and coherent dialogue is crucial.
  • Instruction Following: Suited for tasks requiring the model to adhere to specific instructions and generate relevant outputs.
  • Preference-Aligned Generation: Useful in applications where model outputs need to be aligned with human feedback or preferences.