Name: duong942001/dpo-qwen-cot-merged-pa-ad API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: duong942001

Model Overview

This model, duong942001/dpo-qwen-cot-merged-pa-ad, is a 4 billion parameter language model built upon the Qwen/Qwen3-4B-Instruct-2507 base. It has undergone Direct Preference Optimization (DPO) using the Unsloth library, specifically targeting improved alignment with preferred outputs.

Key Capabilities

Enhanced Reasoning: Optimized to improve Chain-of-Thought (CoT) reasoning, leading to more logical and structured responses.
Preference Alignment: Fine-tuned with DPO to align its outputs with desired response patterns, based on a preference dataset.
Direct Use: Provided as full-merged 16-bit weights, eliminating the need for adapter loading and allowing direct integration with transformers.

Training Details

The model was trained for 1 epoch with a learning rate of 1e-07 and a beta value of 0.4, using a maximum sequence length of 1536. The training utilized the u-10bei/dpo-dataset-qwen-cot dataset. The LoRA configuration (r=8, alpha=16) was merged into the base model during the fine-tuning process.

Licensing

This model is released under the MIT License, consistent with the terms of its training data. Users must also adhere to the license terms of the original base model, Qwen/Qwen3-4B-Instruct-2507.

Overview

Model Overview

Key Capabilities

Training Details

Licensing

Full Model Card (README)