Momoka1010/dpo-qwen-cot-merged
TEXT GENERATIONConcurrency Cost:1Model Size:4BQuant:BF16Ctx Length:32kPublished:Feb 6, 2026License:apache-2.0Architecture:Transformer Open Weights Warm
Momoka1010/dpo-qwen-cot-merged is a 4 billion parameter causal language model, fine-tuned from Qwen/Qwen3-4B-Instruct-2507 using Direct Preference Optimization (DPO). This model is designed to enhance instruction following and response quality, building upon the base capabilities of the Qwen3 architecture. With a 40960 token context length, it is suitable for tasks requiring detailed and coherent text generation.
Loading preview...
Overview
Momoka1010/dpo-qwen-cot-merged is a 4 billion parameter language model derived from the Qwen3-4B-Instruct-2507 base model. It has undergone Direct Preference Optimization (DPO) fine-tuning, a method known for improving model alignment with human preferences and instruction following capabilities.
Key Capabilities
- Enhanced Instruction Following: The DPO fine-tuning process aims to produce responses that are more aligned with user instructions and preferences.
- Qwen3 Architecture: Leverages the robust architecture of the Qwen3 series, providing a strong foundation for various NLP tasks.
- Large Context Window: Supports a context length of 40960 tokens, enabling the processing and generation of longer, more complex texts.
Good For
- Applications requiring models with improved instruction adherence.
- Tasks benefiting from a large context window, such as summarization of long documents or extended conversational AI.
- Developers looking for a Qwen3-based model with enhanced response quality through DPO.