Name: gakhg/dpo-qwen-cot-merged API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: gakhg

Model Overview

This model, gakhg/dpo-qwen-cot-merged, is a 4 billion parameter language model derived from Qwen/Qwen3-4B-Instruct-2507. It has undergone Direct Preference Optimization (DPO) using the Unsloth library to align its responses with preferred outputs.

Key Capabilities

Enhanced Reasoning: Optimized to improve Chain-of-Thought (CoT) reasoning, making it suitable for tasks requiring logical steps and inference.
Structured Response Quality: Focuses on generating higher quality, more structured outputs based on preference datasets.
Full-Merged Weights: Provided as full-merged 16-bit weights, eliminating the need for adapter loading and simplifying deployment with transformers.

Training Details

The model was fine-tuned for 1 epoch with a learning rate of 1e-07 and a beta value of 0.1. It utilized a maximum sequence length of 1024 during training. The LoRA configuration (r=8, alpha=16) was merged into the base model.

Good For

Applications requiring improved logical reasoning and problem-solving.
Generating structured and coherent text responses.
Use cases where direct preference alignment is beneficial for output quality.

Licensing

This model uses the u-10bei/dpo-dataset-qwen-cot for training, which is under the MIT License. Users must also comply with the original base model's license terms.

Overview

Model Overview

Key Capabilities

Training Details

Good For

Licensing

Full Model Card (README)