Name: ShimadaMasatsugu/dpo-qwen-cot-merged API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: ShimadaMasatsugu

Model Overview

ShimadaMasatsugu/dpo-qwen-cot-merged is a specialized language model derived from the Qwen3-4B-Instruct-2507 base model. It has undergone Direct Preference Optimization (DPO) using the Unsloth library, with its 16-bit weights fully merged into the base model, eliminating the need for adapter loading.

Key Capabilities & Optimization

This model's primary optimization objective was to align its responses with preferred outputs, specifically targeting:

Improved Reasoning: Enhanced Chain-of-Thought (CoT) capabilities.
Structured Response Quality: Better generation of structured outputs based on a preference dataset.

Training Details

The DPO fine-tuning process involved:

Base Model: Qwen/Qwen3-4B-Instruct-2507
Method: Direct Preference Optimization (DPO)
Epochs: 1
Learning Rate: 1e-07
Max Sequence Length: 1024
Training Data: Utilized the u-10bei/dpo-dataset-qwen-cot dataset.

Usage & Licensing

As a merged model, it can be directly used with the transformers library. The model is released under the MIT License, consistent with its training data, and users must also adhere to the original base model's license terms.

Overview

Model Overview

Key Capabilities & Optimization

Training Details

Usage & Licensing

Full Model Card (README)