Name: Kyleyee/cDPO_hh-seed4 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: Kyleyee

Model Overview

Kyleyee/cDPO_hh-seed4 is a 1.5 billion parameter language model developed by Kyleyee. It is a fine-tuned version of the Kyleyee/Qwen2.5-1.5B-sft-hh-3e base model, specifically optimized using Direct Preference Optimization (DPO).

Key Capabilities

Preference-aligned generation: The model has been trained on the Kyleyee/train_data_Helpful_drdpo_preference dataset, enhancing its ability to produce helpful and aligned text.
Efficient size: With 1.5 billion parameters, it offers a balance between performance and computational efficiency.
DPO training: Utilizes the Direct Preference Optimization method, as introduced in the paper "Direct Preference Optimization: Your Language Model is Secretly a Reward Model," for robust alignment without explicit reward modeling.

Training Details

The model was trained using the TRL library (version 0.16.0.dev0) within the Hugging Face ecosystem. The training process leveraged DPO to align the model's outputs with human preferences for helpfulness. The base model was further refined using a specific helpfulness preference dataset.

Good For

Applications requiring models that generate helpful and aligned responses.
Scenarios where a smaller, preference-tuned model is beneficial for deployment efficiency.
Research and development in preference-based fine-tuning methods like DPO.

Overview

Model Overview

Key Capabilities

Training Details

Good For

Full Model Card (README)