Name: yuzkawash/dpo-qwen-cot-merged API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: yuzkawash

Overview

This model, yuzkawash/dpo-qwen-cot-merged, is a 4 billion parameter language model built upon the Qwen3-4B-Instruct-2507 base. It has undergone Direct Preference Optimization (DPO) using the Unsloth library, specifically targeting improvements in reasoning and structured response generation. The model incorporates full-merged 16-bit weights, eliminating the need for adapter loading.

Key Capabilities

Enhanced Reasoning: Optimized for Chain-of-Thought (CoT) reasoning, allowing for more logical and step-by-step problem-solving.
Improved Structured Responses: Fine-tuned to produce higher quality, more coherent, and well-structured outputs based on preferred examples.
Direct Use: As a fully merged model, it can be used directly with the transformers library without additional configuration.

Training Details

The model was trained for 1.5 epochs with a learning rate of 2e-06 and a beta value of 0.2. It utilized a maximum sequence length of 1024 and incorporated LoRA configuration (r=8, alpha=16) which was subsequently merged into the base model. The training data used for DPO was sourced from [u-10bei/dpo-dataset-qwen-cot].

Ideal Use Cases

This model is particularly well-suited for applications requiring:

Complex problem-solving where step-by-step reasoning is crucial.
Generating structured data or responses that adhere to specific formats.
Tasks benefiting from improved coherence and logical flow in generated text.

Overview

Overview

Key Capabilities

Training Details

Ideal Use Cases

Full Model Card (README)