Name: Kyleyee/ORPO_hh-seed5 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: Kyleyee

Model Overview

Kyleyee/ORPO_hh-seed5 is a 1.5 billion parameter language model developed by Kyleyee. It is a fine-tuned variant of the Qwen2.5-1.5B-sft-hh-3e model, specifically optimized for generating helpful responses.

Key Capabilities

Preference Optimization: This model has been trained using the ORPO (Monolithic Preference Optimization without Reference Model) method, which aims to align the model's outputs with human preferences for helpfulness.
Instruction Following: By leveraging a helpfulness preference dataset, the model is designed to better understand and respond to user instructions in a helpful manner.
Efficient Fine-tuning: The ORPO method allows for preference optimization without requiring a separate reference model, potentially simplifying the training pipeline.

Training Details

The model was fine-tuned using the TRL framework on the Kyleyee/train_data_Helpful_drdpo_preference dataset. The ORPO method, introduced in the paper "ORPO: Monolithic Preference Optimization without Reference Model," was central to its training procedure.

Good For

Applications requiring models to generate helpful and aligned text.
Conversational AI systems where response quality and user satisfaction are paramount.
Instruction-following tasks where the model needs to provide constructive and useful information.

Overview

Model Overview

Key Capabilities

Training Details

Good For

Full Model Card (README)