koguma-ai/sft-dpo-qwen-cot-merged0207_unsloth_03
TEXT GENERATIONConcurrency Cost:1Model Size:4BQuant:BF16Ctx Length:32kPublished:Feb 8, 2026License:apache-2.0Architecture:Transformer Open Weights Warm

The koguma-ai/sft-dpo-qwen-cot-merged0207_unsloth_03 is a 4 billion parameter Qwen3-based causal language model, fine-tuned by koguma-ai using a two-stage Supervised Fine-Tuning (SFT) and Direct Preference Optimization (DPO) pipeline. This model is specifically optimized for structured output generation and Chain-of-Thought (CoT) reasoning. It features full-merged 16-bit weights and supports a context length of 40960 tokens, making it suitable for tasks requiring detailed reasoning and structured responses.

Loading preview...