bam2app/dpo-qwen-cot-merged_v1
TEXT GENERATIONConcurrency Cost:1Model Size:4BQuant:BF16Ctx Length:32kPublished:Mar 1, 2026License:apache-2.0Architecture:Transformer Open Weights Warm
The bam2app/dpo-qwen-cot-merged_v1 model is a 4 billion parameter language model based on the Qwen3-4B-Instruct-2507 architecture. It has been fine-tuned using Direct Preference Optimization (DPO) to enhance reasoning capabilities, specifically Chain-of-Thought (CoT), and improve structured response quality. This model is optimized for generating aligned and coherent outputs, making it suitable for tasks requiring improved logical progression and structured answers.
Loading preview...