koutch/short_paper_qwen_2.json_train_dpo_v2_train_no_think
The koutch/short_paper_qwen_2.json_train_dpo_v2_train_no_think model is a 4 billion parameter Qwen3-based instruction-tuned causal language model developed by koutch. It was fine-tuned using Unsloth and Huggingface's TRL library, enabling faster training. This model is designed for general instruction-following tasks, leveraging its Qwen3 architecture for robust performance.
Loading preview...
Model Overview
The koutch/short_paper_qwen_2.json_train_dpo_v2_train_no_think is a 4 billion parameter instruction-tuned language model based on the Qwen3 architecture. Developed by koutch, this model was fine-tuned from unsloth/Qwen3-4B-Instruct-2507.
Key Characteristics
- Architecture: Qwen3-based causal language model.
- Parameter Count: 4 billion parameters.
- Training Efficiency: Fine-tuned using Unsloth and Huggingface's TRL library, which facilitated a 2x faster training process.
- License: Released under the Apache-2.0 license.
Intended Use Cases
This model is suitable for a variety of general instruction-following tasks, benefiting from its Qwen3 foundation and efficient fine-tuning. Its 4 billion parameters make it a capable option for applications requiring a balance between performance and computational resources.