Name: SKOTK/dpo-qwen-cot-merged API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: SKOTK

Model Overview

SKOTK/dpo-qwen-cot-merged is a 4 billion parameter language model derived from Qwen/Qwen3-4B-Instruct-2507. It has undergone fine-tuning using Direct Preference Optimization (DPO) via the Unsloth library, resulting in a full-merged 16-bit model that requires no adapter loading.

Key Capabilities

Enhanced Reasoning: Optimized to improve Chain-of-Thought (CoT) reasoning, making it suitable for complex problem-solving tasks.
Structured Response Quality: Fine-tuned to produce higher quality and more aligned structured outputs based on preferred examples.
DPO Alignment: Benefits from DPO training, aligning its responses more closely with desired human preferences.

Training Details

The model was trained for 1 epoch with a learning rate of 1e-07 and a beta value of 0.1, using a maximum sequence length of 1024. The training utilized a specific preference dataset (u-10bei/dpo-dataset-qwen-cot) to guide the DPO process. The base model's license terms (MIT License) apply.

Good For

Applications requiring improved reasoning and logical coherence in responses.
Generating structured outputs that adhere to specific formats or preferences.
Tasks where alignment with human preferences is crucial for output quality.

Overview

Model Overview

Key Capabilities

Training Details

Good For

Full Model Card (README)