Nada2022/dpo-qwen-cot-merged-16bit
TEXT GENERATIONConcurrency Cost:1Model Size:4BQuant:BF16Ctx Length:32kPublished:Feb 8, 2026Architecture:Transformer Warm
The Nada2022/dpo-qwen-cot-merged-16bit is a 4 billion parameter language model based on the Qwen architecture. This model is fine-tuned using Direct Preference Optimization (DPO) and Chain-of-Thought (CoT) techniques, aiming to enhance reasoning capabilities and alignment with human preferences. With a substantial context length of 40960 tokens, it is designed for complex tasks requiring extensive contextual understanding and improved logical inference.
Loading preview...