koutch/short_paper_qwen_1.json_train_dpo_v4_train_no_think

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:4BQuant:BF16Ctx Length:32kPublished:Jan 12, 2026License:apache-2.0Architecture:Transformer Open Weights Warm

The koutch/short_paper_qwen_1.json_train_dpo_v4_train_no_think model is a 4 billion parameter Qwen3-based instruction-tuned causal language model developed by koutch. It was finetuned from unsloth/Qwen3-4B-Instruct-2507 using Unsloth and Huggingface's TRL library, emphasizing faster training. This model is designed for general instruction-following tasks, leveraging its 40960 token context length for processing extensive inputs.

Loading preview...

Model Overview

The koutch/short_paper_qwen_1.json_train_dpo_v4_train_no_think model is a 4 billion parameter instruction-tuned language model. Developed by koutch, it is based on the Qwen3 architecture and was finetuned from unsloth/Qwen3-4B-Instruct-2507.

Key Characteristics

  • Architecture: Qwen3-based causal language model.
  • Parameter Count: 4 billion parameters.
  • Context Length: Supports a substantial context window of 40960 tokens.
  • Training Methodology: Finetuned using Unsloth and Huggingface's TRL library, which facilitated a 2x faster training process.
  • License: Distributed under the Apache-2.0 license.

Intended Use Cases

This model is suitable for a variety of instruction-following applications, benefiting from its large context window and efficient training. Its Qwen3 foundation suggests capabilities across general language understanding and generation tasks.