ogwata/exp7-dpo-baseline
TEXT GENERATIONConcurrency Cost:1Model Size:4BQuant:BF16Ctx Length:32kPublished:Feb 13, 2026License:apache-2.0Architecture:Transformer Open Weights Warm

The ogwata/exp7-dpo-baseline is a 4 billion parameter Qwen3-based causal language model fine-tuned using Direct Preference Optimization (DPO) via Unsloth. This model is specifically optimized to improve reasoning capabilities, particularly Chain-of-Thought, and enhance structured response quality. It is designed for tasks requiring aligned and coherent outputs based on preferred data, offering a specialized alternative to general-purpose LLMs.

Loading preview...