toenobu/utokyo-llm-advance-main-dpo
TEXT GENERATIONConcurrency Cost:1Model Size:4BQuant:BF16Ctx Length:32kPublished:Feb 3, 2026License:apache-2.0Architecture:Transformer Open Weights Warm
The toenobu/utokyo-llm-advance-main-dpo model is a fine-tuned version of Qwen/Qwen3-4B-Instruct-2507, developed by toenobu. This 4 billion parameter model utilizes Direct Preference Optimization (DPO) to enhance reasoning capabilities, specifically Chain-of-Thought (CoT), and improve structured response quality. It is optimized for tasks requiring aligned and coherent outputs based on preferred data, making it suitable for applications needing improved logical flow and structured generation.
Loading preview...