koutch/short_paper_qwen_1.json_train_dpo_v4_train_no_think
The koutch/short_paper_qwen_1.json_train_dpo_v4_train_no_think model is a 4 billion parameter Qwen3-based instruction-tuned causal language model developed by koutch. It was finetuned from unsloth/Qwen3-4B-Instruct-2507 using Unsloth and Huggingface's TRL library, emphasizing faster training. This model is designed for general instruction-following tasks, leveraging its 40960 token context length for processing extensive inputs.
Loading preview...
Model Overview
The koutch/short_paper_qwen_1.json_train_dpo_v4_train_no_think model is a 4 billion parameter instruction-tuned language model. Developed by koutch, it is based on the Qwen3 architecture and was finetuned from unsloth/Qwen3-4B-Instruct-2507.
Key Characteristics
- Architecture: Qwen3-based causal language model.
- Parameter Count: 4 billion parameters.
- Context Length: Supports a substantial context window of 40960 tokens.
- Training Methodology: Finetuned using Unsloth and Huggingface's TRL library, which facilitated a 2x faster training process.
- License: Distributed under the Apache-2.0 license.
Intended Use Cases
This model is suitable for a variety of instruction-following applications, benefiting from its large context window and efficient training. Its Qwen3 foundation suggests capabilities across general language understanding and generation tasks.