Name: bam2app/dpo-qwen-cot-merged_v3 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: bam2app

Model Overview

The bam2app/dpo-qwen-cot-merged_v3 is a 4 billion parameter language model derived from Qwen/Qwen3-4B-Instruct-2507. It has undergone Direct Preference Optimization (DPO) using the Unsloth library, resulting in a merged 16-bit weight model that requires no adapter loading.

Key Capabilities

Enhanced Reasoning: Optimized to improve Chain-of-Thought (CoT) reasoning, leading to more logical and coherent outputs.
Structured Response Quality: Fine-tuned to align responses with preferred outputs, enhancing the quality and structure of generated text.
DPO Alignment: Utilizes DPO to align model behavior with human preferences, focusing on specific response characteristics.

Training Details

The model was trained for 1 epoch with a learning rate of 3e-06 and a beta value of 0.2, using a maximum sequence length of 1024. The training data utilized was [u-10bei/dpo-dataset-qwen-cot].

Good For

Applications requiring improved logical reasoning and step-by-step thought processes.
Generating structured and high-quality responses that adhere to specific formats or preferences.
Tasks where alignment with preferred outputs is critical for performance.

Overview

Model Overview

Key Capabilities

Training Details

Good For

Full Model Card (README)