Name: OguraHiroyuki/dpo-qwen-cot-mergedv4 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: OguraHiroyuki

Model Overview

OguraHiroyuki/dpo-qwen-cot-mergedv4 is a 4 billion parameter language model, fine-tuned from the Qwen/Qwen3-4B-Instruct-2507 base model. It leverages Direct Preference Optimization (DPO), implemented with the Unsloth library, to align its outputs with preferred responses.

Key Capabilities

Enhanced Reasoning: Optimized to improve Chain-of-Thought (CoT) reasoning abilities.
Structured Response Quality: Focuses on generating higher quality, more structured outputs based on preference datasets.
Instruction Following: Designed for better adherence to instructions, making it suitable for conversational and task-oriented AI.

Training Details

The model was trained for 1 epoch with a learning rate of 1e-06 and a beta value of 0.1, using a maximum sequence length of 1024. The training utilized the u-10bei/dpo-dataset-qwen-cot dataset. The LoRA configuration (r=8, alpha=16) was merged into the base model, providing full 16-bit weights without requiring adapter loading.

Usage

This merged model can be directly used with the transformers library, simplifying deployment for inference tasks. It is licensed under the MIT License, with users also required to comply with the original base model's license terms.

Overview

Model Overview

Key Capabilities

Training Details

Usage

Full Model Card (README)