Name: shotalab/Qwen3-4B-Instruct-SFT-03-Merged-DPO-01 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: shotalab

Overview

This model, shotalab/Qwen3-4B-Instruct-SFT-03-Merged-DPO-01, is a 4 billion parameter language model derived from shotalab/Qwen3-4B-Instruct-SFT-03. It has undergone further fine-tuning using Direct Preference Optimization (DPO), leveraging the Unsloth library to enhance its performance.

Key Capabilities

Improved Reasoning: Optimized to enhance Chain-of-Thought (CoT) reasoning, allowing for more logical and step-by-step problem-solving.
Structured Response Generation: Specifically aligned to produce higher quality, more structured outputs based on preferred response patterns.
Full-Merged Weights: Distributed as full-merged 16-bit weights, eliminating the need for adapter loading and simplifying deployment.

Training Details

The model was trained for 0.3 epochs with a learning rate of 3e-07 and a beta value of 0.4, using a maximum sequence length of 1024. The DPO training utilized the u-10bei/dpo-dataset-qwen-cot dataset, which focuses on improving reasoning and structured responses. The model is released under the MIT License, with users required to comply with the original base model's license terms.

Good For

Applications requiring enhanced logical reasoning and problem-solving.
Generating well-structured and coherent text outputs.
Scenarios where direct preference alignment for response quality is crucial.

Overview

Overview

Key Capabilities

Training Details

Good For

Full Model Card (README)