Name: bam2app/dpo-qwen-cot-merged_v1 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: bam2app

Model Overview

This model, bam2app/dpo-qwen-cot-merged_v1, is a 4 billion parameter language model derived from the Qwen/Qwen3-4B-Instruct-2507 base model. It has undergone Direct Preference Optimization (DPO) using the Unsloth library, resulting in a full-merged 16-bit weight model that requires no adapter loading.

Key Capabilities

Enhanced Reasoning (Chain-of-Thought): Optimized to produce more logical and step-by-step reasoning in its responses.
Improved Structured Output: Fine-tuned to generate higher quality and more structured answers based on preference datasets.
DPO Alignment: Benefits from DPO training to align its outputs with preferred response styles.

Training Details

The model was trained for 1 epoch with a learning rate of 5e-06 and a beta value of 0.1. It utilized a maximum sequence length of 1024 tokens. The training data used for DPO was u-10bei/dpo-dataset-qwen-cot. The model is released under the MIT License, with users also required to comply with the original base model's license terms.

Good For

Applications requiring improved logical reasoning and Chain-of-Thought capabilities.
Scenarios where structured and aligned responses are critical.
Developers seeking a DPO-optimized Qwen3-4B variant for enhanced output quality.

Overview

Model Overview

Key Capabilities

Training Details

Good For

Full Model Card (README)