follmon10/qwen3-4b-dpo-qwen-cot-merged_v1
TEXT GENERATIONConcurrency Cost:1Model Size:4BQuant:BF16Ctx Length:32kPublished:Mar 1, 2026License:apache-2.0Architecture:Transformer Open Weights Warm

follmon10/qwen3-4b-dpo-qwen-cot-merged_v1 is a 4 billion parameter language model fine-tuned from Qwen/Qwen3-4B-Instruct-2507. It utilizes Direct Preference Optimization (DPO) to enhance reasoning capabilities, specifically Chain-of-Thought (CoT), and improve structured response quality. This model is optimized for generating aligned and coherent outputs based on preferred examples, making it suitable for tasks requiring improved logical flow and structured answers.

Loading preview...