Name: NobutaMN/dpo-qwen-cot-merged API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: NobutaMN

Overview

NobutaMN/dpo-qwen-cot-merged is a 4 billion parameter language model that has been fine-tuned using Direct Preference Optimization (DPO) via the Unsloth library. This model builds upon the NobutaMN/qwen3-4b-structevalt-lora-nobuta-v2-3change base, with a primary focus on improving its reasoning (Chain-of-Thought) and structured response generation capabilities.

Key Capabilities

Enhanced Reasoning: Optimized to produce more coherent and logical Chain-of-Thought reasoning.
Improved Structured Responses: Fine-tuned to generate higher quality, well-structured outputs based on preference data.
DPO Alignment: Leverages Direct Preference Optimization to align model behavior with desired response patterns.

Good For

Applications requiring robust logical deduction and step-by-step reasoning.
Tasks where structured and consistent output formats are critical.
Use cases benefiting from models aligned with human preferences for response quality.

This repository provides LoRA adapter weights only, which must be loaded on top of the specified base model (Qwen/Qwen3-4B-Instruct-2507). The training utilized a learning rate of 1e-07 over 1 epoch, with a maximum sequence length of 1024.