Name: seibergwitten/dpo-qwen-cot-merged.ver0 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: seibergwitten

Model Overview

This model, seibergwitten/dpo-qwen-cot-merged.ver0, is a 4 billion parameter language model derived from Qwen/Qwen3-4B-Instruct-2507. It has been fine-tuned using Direct Preference Optimization (DPO), leveraging the Unsloth library to achieve improved response alignment.

Key Capabilities

Enhanced Reasoning: Optimized to improve Chain-of-Thought (CoT) reasoning, enabling more structured and logical outputs.
Aligned Responses: Fine-tuned with DPO to align its generated text with preferred outputs, leading to higher quality and more desirable responses.
Structured Output: Focuses on improving the quality of structured responses based on a preference dataset.
Direct Usage: Provided as a full-merged 16-bit weights model, eliminating the need for adapter loading and allowing direct use with the transformers library.

Training Details

The model underwent 1 epoch of DPO training with a learning rate of 1e-07 and a beta value of 0.1. It was configured with a maximum sequence length of 1024 and utilized a LoRA configuration (r=8, alpha=16) which has been merged into the base weights.

Good For

Applications requiring models with strong reasoning and logical flow.
Tasks where response quality and alignment to specific preferences are critical.
Generating structured and coherent text outputs.