Name: takeshi200ok/dpo-qwen-cot-merged API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: takeshi200ok

Overview

This model, dpo-qwen-cot-merged, is a 4 billion parameter language model based on the Qwen3 architecture, specifically fine-tuned from Qwen/Qwen3-4B-Instruct-2507. Developed by takeshi200ok, it leverages Direct Preference Optimization (DPO) via the Unsloth library to align its responses with preferred outputs.

Key Capabilities

Enhanced Reasoning: Optimized to improve Chain-of-Thought (CoT) reasoning, making it suitable for tasks requiring multi-step logical deduction.
Improved Structured Responses: DPO training focuses on generating higher quality and more structured outputs.
Fully Merged Model: Provided as a complete 16-bit model, eliminating the need for separate adapter loading.

Training Details

The model underwent 1 epoch of DPO training with a learning rate of 1e-05 and a beta value of 0.1. It utilized a maximum sequence length of 3072 tokens. The training data was sourced from u-10bei/dpo-dataset-qwen-cot.

Good For

Applications requiring robust reasoning and logical inference.
Generating structured and coherent text outputs.
Use cases where response alignment and quality are critical.

Overview

Overview

Key Capabilities

Training Details

Good For

Full Model Card (README)