koutch/short_paper_qwen_1.json_train_dpo_v4_train_no_think
TEXT GENERATIONConcurrency Cost:1Model Size:4BQuant:BF16Ctx Length:32kPublished:Jan 12, 2026License:apache-2.0Architecture:Transformer Open Weights Warm

The koutch/short_paper_qwen_1.json_train_dpo_v4_train_no_think model is a 4 billion parameter Qwen3-based instruction-tuned causal language model developed by koutch. It was finetuned from unsloth/Qwen3-4B-Instruct-2507 using Unsloth and Huggingface's TRL library, emphasizing faster training. This model is designed for general instruction-following tasks, leveraging its 40960 token context length for processing extensive inputs.

Loading preview...