Name: smzyuki/dpo-qwen-cot-merged API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: smzyuki

Model Overview

This model, smzyuki/dpo-qwen-cot-merged, is a 4 billion parameter language model derived from Qwen/Qwen3-4B-Instruct-2507. It has undergone Direct Preference Optimization (DPO) using the Unsloth library, with its 16-bit weights fully merged for direct use without adapter loading.

Key Capabilities

Enhanced Reasoning: Optimized specifically to improve Chain-of-Thought (CoT) reasoning, making it suitable for tasks requiring multi-step logical deduction.
Structured Response Quality: Fine-tuned to produce more aligned and structured outputs based on preference datasets.
Efficient Deployment: As a merged model, it simplifies deployment by eliminating the need for separate adapter loading.

Training Details

The model was trained for 2 epochs with a learning rate of 1e-07 and a beta value of 0.1, using a maximum sequence length of 2048. The training leveraged datasets such as u-10bei/structured_data_with_cot_dataset_512_v2, u-10bei/structured_data_with_cot_dataset_512_v5, and daichira/structured-5k-mix-sft.

Licensing

The model operates under the MIT License, consistent with its training data. Users must also adhere to the original base model's license terms.

Overview

Model Overview

Key Capabilities

Training Details

Licensing

Full Model Card (README)