Name: Kyleyee/cDPO_hh-seed2 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: Kyleyee

Model Overview

Kyleyee/cDPO_hh-seed2 is a 1.5 billion parameter language model developed by Kyleyee. It is a fine-tuned variant of the Qwen2.5-1.5B-sft-hh-3e base model, specifically optimized for generating helpful and harmless text.

Key Capabilities

Preference-based Alignment: The model has been trained using Direct Preference Optimization (DPO), a method that leverages human preferences to align model outputs with desired behaviors (helpfulness and harmlessness).
Conversational AI: Its training on a helpfulness and harmlessness preference dataset makes it suitable for dialogue systems and chatbots where aligned and safe responses are critical.
Extended Context Window: Supports a context length of 32768 tokens, allowing for more extensive and coherent conversations or document processing.

Training Details

The model was fine-tuned on the Kyleyee/train_data_Helpful_drdpo_preference dataset using the TRL (Transformer Reinforcement Learning) library. The DPO method, introduced in the paper "Direct Preference Optimization: Your Language Model is Secretly a Reward Model," was central to its training process.

Use Cases

This model is particularly well-suited for applications requiring:

Safe and Aligned Chatbots: Generating responses that adhere to helpful and harmless guidelines.
Content Moderation Assistance: Aiding in the creation of appropriate and non-toxic content.
General Purpose Text Generation: Producing coherent and contextually relevant text with an emphasis on beneficial outputs.

Overview

Model Overview

Key Capabilities

Training Details

Use Cases

Full Model Card (README)