Hi-Satoh/adv_sft_dpo_final_8_merged
TEXT GENERATIONConcurrency Cost:1Model Size:4BQuant:BF16Ctx Length:32kPublished:Mar 1, 2026License:apache-2.0Architecture:Transformer Open Weights Warm
Hi-Satoh/adv_sft_dpo_final_8_merged is a 4 billion parameter causal language model, fine-tuned from Qwen/Qwen3-4B-Instruct-2507 using Direct Preference Optimization (DPO) via Unsloth. This model is specifically optimized to improve reasoning capabilities (Chain-of-Thought) and structured response quality. It excels in generating aligned responses based on preferred outputs, making it suitable for tasks requiring high-quality, structured text generation.
Loading preview...