Name: Kyleyee/ORPO_hh-seed3 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: Kyleyee

Model Overview

Kyleyee/ORPO_hh-seed3 is a 1.5 billion parameter language model developed by Kyleyee, fine-tuned from the base model Kyleyee/Qwen2.5-1.5B-sft-hh-3e. It utilizes a substantial context length of 32768 tokens, making it suitable for processing longer inputs and generating comprehensive outputs.

Key Capabilities & Training

This model's primary distinction lies in its training methodology: it was fine-tuned using ORPO (Monolithic Preference Optimization without Reference Model). This technique, detailed in the paper "ORPO: Monolithic Preference Optimization without Reference Model" (arXiv:2403.07691), allows for preference alignment directly, simplifying the optimization process. The training was conducted using the TRL framework on the specific dataset Kyleyee/train_data_Helpful_drdpo_preference, indicating an optimization for generating helpful and aligned responses.

Use Cases

Given its training on a helpful preference dataset and the ORPO method, this model is particularly well-suited for applications requiring:

Generating helpful and aligned text responses.
Tasks where preference optimization is crucial for output quality.
Scenarios benefiting from a model with a large context window for detailed interactions.

Overview

Model Overview

Key Capabilities & Training

Use Cases

Full Model Card (README)