koutch/short_paper_qwen_2.json_train_dpo_v2_train_no_think

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:4BQuant:BF16Ctx Length:32kPublished:Jan 14, 2026License:apache-2.0Architecture:Transformer Open Weights Warm

The koutch/short_paper_qwen_2.json_train_dpo_v2_train_no_think model is a 4 billion parameter Qwen3-based instruction-tuned causal language model developed by koutch. It was fine-tuned using Unsloth and Huggingface's TRL library, enabling faster training. This model is designed for general instruction-following tasks, leveraging its Qwen3 architecture for robust performance.

Loading preview...

Model Overview

The koutch/short_paper_qwen_2.json_train_dpo_v2_train_no_think is a 4 billion parameter instruction-tuned language model based on the Qwen3 architecture. Developed by koutch, this model was fine-tuned from unsloth/Qwen3-4B-Instruct-2507.

Key Characteristics

  • Architecture: Qwen3-based causal language model.
  • Parameter Count: 4 billion parameters.
  • Training Efficiency: Fine-tuned using Unsloth and Huggingface's TRL library, which facilitated a 2x faster training process.
  • License: Released under the Apache-2.0 license.

Intended Use Cases

This model is suitable for a variety of general instruction-following tasks, benefiting from its Qwen3 foundation and efficient fine-tuning. Its 4 billion parameters make it a capable option for applications requiring a balance between performance and computational resources.