daichira/dpo-qwen-cot-merged
TEXT GENERATIONConcurrency Cost:1Model Size:4BQuant:BF16Ctx Length:32kPublished:Jan 25, 2026License:apache-2.0Architecture:Transformer Open Weights Cold

The daichira/dpo-qwen-cot-merged is a 4 billion parameter Qwen3-based instruction-tuned language model, finetuned by daichira. This model was optimized for faster training using Unsloth and Huggingface's TRL library. It is designed for general language understanding and generation tasks, leveraging its Qwen3 architecture for robust performance.

Loading preview...

Overview

The daichira/dpo-qwen-cot-merged is a 4 billion parameter language model developed by daichira. It is a finetuned variant of the unsloth/qwen3-4b-instruct-2507-unsloth-bnb-4bit model, leveraging the Qwen3 architecture. The model was specifically trained for enhanced efficiency, achieving a 2x faster training speed through the integration of Unsloth and Huggingface's TRL library.

Key Characteristics

  • Base Model: Qwen3-4B-Instruct
  • Parameter Count: 4 billion parameters
  • Context Length: 40960 tokens
  • Training Optimization: Utilizes Unsloth and Huggingface TRL for accelerated finetuning.
  • License: Apache-2.0

Intended Use Cases

This model is suitable for a variety of general-purpose language tasks, benefiting from its Qwen3 foundation and efficient finetuning. Its optimized training process suggests a focus on delivering strong performance within a 4B parameter footprint, making it a candidate for applications where computational efficiency during development is a priority.