rambling1228/dpo-qwen-cot-merged

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:4BQuant:BF16Ctx Length:32kPublished:Mar 2, 2026License:apache-2.0Architecture:Transformer Open Weights Warm

The rambling1228/dpo-qwen-cot-merged is a 4 billion parameter Qwen3-based causal language model developed by rambling1228. This model was fine-tuned using Unsloth and Huggingface's TRL library, enabling faster training. It is designed for general language generation tasks, leveraging its Qwen3 architecture and efficient fine-tuning process.

Loading preview...

Model Overview

The rambling1228/dpo-qwen-cot-merged is a 4 billion parameter language model based on the Qwen3 architecture. Developed by rambling1228, this model has been fine-tuned using a combination of Unsloth and Huggingface's TRL library. This approach allowed for a significantly faster training process, specifically noted as 2x faster.

Key Characteristics

  • Base Model: Qwen3-4B-Instruct
  • Parameter Count: 4 billion
  • Context Length: 32768 tokens
  • Training Method: Fine-tuned with Unsloth and Huggingface's TRL library for accelerated training.
  • License: Apache-2.0

Intended Use Cases

This model is suitable for a variety of general-purpose language generation and instruction-following tasks, benefiting from its Qwen3 foundation and efficient fine-tuning. Its optimized training process suggests a focus on delivering capable performance within a 4B parameter footprint.