Name: Hi-Satoh/adv_sft3J_dpo_merged API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: Hi-Satoh

Model Overview

This model, Hi-Satoh/adv_sft3J_dpo_merged, is a 4 billion parameter language model developed by Hi-Satoh. It is a fine-tuned version of the Qwen/Qwen3-4B-Instruct-2507 base model, specifically optimized using Direct Preference Optimization (DPO) via the Unsloth library. The repository provides the full-merged 16-bit weights, eliminating the need for adapter loading.

Key Capabilities

Enhanced Reasoning: Optimized to improve Chain-of-Thought (CoT) reasoning, leading to more logical and step-by-step responses.
Improved Structured Output: Focuses on generating higher quality structured responses by aligning with preferred outputs from the training dataset.
DPO Fine-tuning: Leverages DPO with a beta of 0.05 and a learning rate of 1e-06 over 2 epochs, using a maximum sequence length of 4096.

Good For

Applications requiring models with strong reasoning abilities.
Tasks where structured and coherent output is crucial.
Developers looking for a Qwen3-4B variant with enhanced alignment and response quality through DPO.

Licensing

The model operates under the MIT License, as per the training dataset terms. Users must also adhere to the original base model's license terms.

Overview

Model Overview

Key Capabilities

Good For

Licensing

Full Model Card (README)