Name: ykawasaki/qwen3-4b-dpo-qwen-cot-merged-v7 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: ykawasaki

Model Overview

ykawasaki/qwen3-4b-dpo-qwen-cot-merged-v7 is a 4 billion parameter language model built upon the Qwen/Qwen3-4B-Instruct-2507 base model. It has been fine-tuned using Direct Preference Optimization (DPO) via the Unsloth library, specifically targeting improved reasoning capabilities through Chain-of-Thought (CoT) and enhanced structured response quality.

Key Characteristics

Base Model: Qwen/Qwen3-4B-Instruct-2507, a 4 billion parameter Qwen3 variant.
Fine-tuning Method: Direct Preference Optimization (DPO) for aligning responses with preferred outputs.
Adapter Integration: Merged with ykawasaki/qwen3-4b-structured-output-lora-v12 prior to DPO, focusing on structured output.
Training Objective: Optimized to improve reasoning (Chain-of-Thought) and the quality of structured responses based on a preference dataset.
Configuration: Trained for 3 epochs with a learning rate of 1e-07, beta of 0.1, and a maximum sequence length of 1024. LoRA configuration includes r=16 and alpha=32.

Usage and Licensing

This model is provided as full-merged 16-bit weights, allowing direct use with the transformers library without requiring separate adapter loading. It is licensed under the MIT License, consistent with its training data, and users must also adhere to the original base model's license terms.

Ideal Use Cases

Applications requiring improved reasoning and Chain-of-Thought capabilities.
Tasks where structured and high-quality responses are critical.
Scenarios benefiting from a DPO-tuned model for better alignment with desired outputs.

Overview

Model Overview

Key Characteristics

Usage and Licensing

Ideal Use Cases

Full Model Card (README)