takami2022/qwen3-4b-dpo-v2
TEXT GENERATIONConcurrency Cost:1Model Size:4BQuant:BF16Ctx Length:32kPublished:Mar 1, 2026License:apache-2.0Architecture:Transformer Open Weights Warm

The takami2022/qwen3-4b-dpo-v2 is a 4 billion parameter language model, based on the Qwen3 architecture, that has been further fine-tuned using Direct Preference Optimization (DPO). This model is a refinement of its v1 predecessor, specifically re-trained with a reduced DPO beta value of 0.05 to enhance its alignment. It is designed for tasks benefiting from improved preference alignment, building upon the Qwen3 base.

Loading preview...