Name: HidekiKawai/dpo-qwen-cot-merged API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: HidekiKawai

Overview

This model, HidekiKawai/dpo-qwen-cot-merged, is a fine-tuned version of HidekiKawai/sft-qwen-merged. It leverages Direct Preference Optimization (DPO) with the Unsloth library to align its responses with preferred outputs.

Key Capabilities

Enhanced Reasoning: Optimized to improve Chain-of-Thought (CoT) reasoning, leading to more structured and logical outputs.
Improved Response Quality: Fine-tuned to produce higher quality, aligned responses based on a preference dataset.
Direct Use: Provided as a full-merged 16-bit model, eliminating the need for adapter loading and simplifying deployment with transformers.

Training Details

Base Model: HidekiKawai/sft-qwen-merged
Optimization Method: DPO (Direct Preference Optimization)
Epochs: 3
Learning Rate: 2e-05
Max Sequence Length: 1024
Training Data: Utilizes the u-10bei/dpo-dataset-qwen-cot dataset for preference alignment.

Usage

This model can be directly loaded and used with the transformers library for inference, as it contains the merged 16-bit weights. Users should ensure compliance with the MIT License and the original base model's license terms.

Overview

Overview

Key Capabilities

Training Details

Usage

Full Model Card (README)