Hi-Satoh/adv_sft3J_dpo_merged
TEXT GENERATIONConcurrency Cost:1Model Size:4BQuant:BF16Ctx Length:32kPublished:Feb 22, 2026License:apache-2.0Architecture:Transformer Open Weights Warm
Hi-Satoh/adv_sft3J_dpo_merged is a 4 billion parameter instruction-tuned causal language model developed by Hi-Satoh, fine-tuned from Qwen/Qwen3-4B-Instruct-2507. This model utilizes Direct Preference Optimization (DPO) to enhance its reasoning capabilities, particularly for Chain-of-Thought (CoT) processes, and improve structured response quality. It is optimized for generating aligned and coherent outputs based on preferred data, making it suitable for tasks requiring improved logical flow and structured answers.
Loading preview...
Popular Sampler Settings
Top 3 parameter combinations used by Featherless users for this model. Click a tab to see each config.
temperature
–
top_p
–
top_k
–
frequency_penalty
–
presence_penalty
–
repetition_penalty
–
min_p
–