Name: Kyleyee/cDPO_hh-seed3 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: Kyleyee

Overview

Kyleyee/cDPO_hh-seed3 is a 1.5 billion parameter language model, fine-tuned by Kyleyee from the base model Kyleyee/Qwen2.5-1.5B-sft-hh-3e. This model leverages the Direct Preference Optimization (DPO) method, as introduced in the paper "Direct Preference Optimization: Your Language Model is Secretly a Reward Model," to align its outputs with human preferences. It was specifically trained on the Kyleyee/train_data_Helpful_drdpo_preference dataset, focusing on generating helpful and high-quality responses.

Key Capabilities

Preference-aligned Generation: Optimized using DPO to produce responses that are preferred by humans, enhancing helpfulness and quality.
Instruction Following: Designed to effectively follow user instructions and generate relevant outputs.
Conversational AI: Suitable for dialogue systems and chatbots where nuanced and helpful interactions are desired.
Efficient Size: At 1.5 billion parameters, it offers a balance between performance and computational efficiency.
Extended Context: Supports a context length of 32768 tokens, allowing for processing longer inputs and maintaining coherence over extended conversations.

Good for

Chatbot Development: Creating more helpful and engaging conversational agents.
Assistant Models: Building AI assistants that provide preferred and aligned responses.
Preference-based Fine-tuning: Demonstrating the application of DPO for aligning language models.
Resource-constrained Environments: Deploying a capable model in scenarios where larger models might be prohibitive.

Overview

Overview

Key Capabilities

Good for

Full Model Card (README)