Hi-Satoh/adv_MoE_sft3_dpo_merged
TEXT GENERATIONConcurrency Cost:1Model Size:4BQuant:BF16Ctx Length:32kPublished:Feb 26, 2026License:apache-2.0Architecture:Transformer Open Weights Warm
Hi-Satoh/adv_MoE_sft3_dpo_merged is a 4 billion parameter language model developed by Hi-Satoh, fine-tuned from Qwen/Qwen3-4B-Instruct-2507. It utilizes Direct Preference Optimization (DPO) via the Unsloth library to enhance reasoning capabilities, specifically Chain-of-Thought, and improve the quality of structured responses. This model is provided with full-merged 16-bit weights, eliminating the need for adapter loading, and is optimized for tasks requiring aligned and coherent output based on preference data.
Loading preview...