AlainGuillotin/dpo-qwen-cot-merged
TEXT GENERATIONConcurrency Cost:1Model Size:4BQuant:BF16Ctx Length:32kPublished:Mar 1, 2026License:apache-2.0Architecture:Transformer Open Weights Warm

AlainGuillotin/dpo-qwen-cot-merged is a 4 billion parameter Qwen3-based causal language model fine-tuned by AlainGuillotin. It utilizes Direct Preference Optimization (DPO) to enhance reasoning (Chain-of-Thought) and structured response quality. This model, with a 32768 token context length, is optimized for generating aligned and coherent outputs based on preferred data. It is suitable for tasks requiring improved logical flow and structured text generation.

Loading preview...