Hi-Satoh/adv_sft3J_dpo_merged
TEXT GENERATIONConcurrency Cost:1Model Size:4BQuant:BF16Ctx Length:32kPublished:Feb 22, 2026License:apache-2.0Architecture:Transformer Open Weights Warm

Hi-Satoh/adv_sft3J_dpo_merged is a 4 billion parameter instruction-tuned causal language model developed by Hi-Satoh, fine-tuned from Qwen/Qwen3-4B-Instruct-2507. This model utilizes Direct Preference Optimization (DPO) to enhance its reasoning capabilities, particularly for Chain-of-Thought (CoT) processes, and improve structured response quality. It is optimized for generating aligned and coherent outputs based on preferred data, making it suitable for tasks requiring improved logical flow and structured answers.

Loading preview...

Popular Sampler Settings

Top 3 parameter combinations used by Featherless users for this model. Click a tab to see each config.

temperature
top_p
top_k
frequency_penalty
presence_penalty
repetition_penalty
min_p