Hi-Satoh/adv_sft_dpo_final_10_merged
TEXT GENERATIONConcurrency Cost:1Model Size:4BQuant:BF16Ctx Length:32kPublished:Mar 1, 2026License:apache-2.0Architecture:Transformer Open Weights Warm

Hi-Satoh/adv_sft_dpo_final_10_merged is a 4 billion parameter causal language model fine-tuned from Qwen/Qwen3-4B-Instruct-2507 by Hi-Satoh. Utilizing Direct Preference Optimization (DPO) via Unsloth, this model is specifically optimized to improve reasoning (Chain-of-Thought) and structured response quality. It offers enhanced alignment with preferred outputs, making it suitable for tasks requiring precise and well-reasoned answers.

Loading preview...