ryowatanabe240215/qwen3-4b-structured-output-lora_ver10-2_merge_dpo
TEXT GENERATIONConcurrency Cost:1Model Size:4BQuant:BF16Ctx Length:32kPublished:Mar 1, 2026License:apache-2.0Architecture:Transformer Open Weights Warm

The ryowatanabe240215/qwen3-4b-structured-output-lora_ver10-2_merge_dpo is a 4 billion parameter Qwen3-based instruction-tuned causal language model, fine-tuned by ryowatanabe240215. It leverages Direct Preference Optimization (DPO) to enhance reasoning (Chain-of-Thought) and structured output quality. This model is specifically optimized for generating preferred, well-structured responses, making it suitable for applications requiring precise and logical outputs.

Loading preview...