Name: AlainGuillotin/dpo-qwen-cot-merged API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: AlainGuillotin

Model Overview

This model, AlainGuillotin/dpo-qwen-cot-merged, is a fine-tuned version of the Qwen/Qwen3-4B-Instruct-2507 base model. It has been optimized using Direct Preference Optimization (DPO) via the Unsloth library to align its responses with preferred outputs.

Key Capabilities

Enhanced Reasoning: Specifically trained to improve Chain-of-Thought (CoT) reasoning abilities.
Structured Response Quality: Focuses on generating more structured and coherent outputs.
Full-Merged Weights: The repository contains the full-merged 16-bit weights, eliminating the need for adapter loading.

Training Details

Method: Direct Preference Optimization (DPO).
Base Model: Qwen/Qwen3-4B-Instruct-2507.
Dataset: Trained on the u-10bei/dpo-dataset-qwen-cot preference dataset.
Configuration: Trained for 1 epoch with a learning rate of 1e-07 and a beta of 0.1. The maximum sequence length used during training was 1024.

Usage

This model can be used directly with the transformers library for inference, as it provides merged weights. Users should adhere to the MIT License and the original base model's license terms.

Overview

Model Overview

Key Capabilities

Training Details

Usage

Full Model Card (README)