Hi-Satoh/adv_sft_dpo_final_14_merged
TEXT GENERATIONConcurrency Cost:1Model Size:4BQuant:BF16Ctx Length:32kPublished:Mar 1, 2026License:apache-2.0Architecture:Transformer Open Weights Warm

Hi-Satoh/adv_sft_dpo_final_14_merged is a 4 billion parameter Qwen3-based causal language model developed by Hi-Satoh. This model has been fine-tuned using Direct Preference Optimization (DPO) to enhance reasoning capabilities and structured response quality. It is specifically optimized for generating aligned outputs based on preferred data, making it suitable for tasks requiring improved logical coherence and format adherence.

Loading preview...