Name: kikansha-Tomasu/sft-dpo-qwen-cot-merged API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: kikansha-Tomasu

Model Overview

This model, kikansha-Tomasu/sft-dpo-qwen-cot-merged, is a 4 billion parameter language model derived from kikansha-Tomasu/Qwen3-4B-Instruct-2507-SFT. It has been further fine-tuned using Direct Preference Optimization (DPO) via the Unsloth library, with its LoRA adapters merged into the base model for direct use.

Key Capabilities

Enhanced Reasoning: Optimized to improve Chain-of-Thought (CoT) reasoning, leading to more logical and structured problem-solving.
Improved Response Quality: Aligned through DPO to generate preferred outputs, focusing on high-quality and structured responses.
Direct Usage: Provided as a full-merged 16-bit model, eliminating the need for adapter loading and simplifying deployment with transformers.

Training Details

The model underwent 1 epoch of DPO training with a learning rate of 1e-07 and a beta value of 0.1. It utilized a maximum sequence length of 1024 during training. The preference dataset used for DPO was [u-10bei/dpo-dataset-qwen-cot].

Licensing

This model operates under the MIT License, consistent with its training data. Users are also required to adhere to the licensing terms of the original base model.

Overview

Model Overview

Key Capabilities

Training Details

Licensing

Full Model Card (README)