kedumerikugame/dpo-qwen-cot-merged

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:4BQuant:BF16Ctx Length:32kPublished:Feb 21, 2026License:apache-2.0Architecture:Transformer Open Weights Warm

The kedumerikugame/dpo-qwen-cot-merged model is a 4 billion parameter language model, fine-tuned from Qwen/Qwen3-4B-Instruct-2507 using Direct Preference Optimization (DPO) via the Unsloth library. This model integrates the LoRA adapter weights directly into the base model, providing a fully merged 16-bit version. It is optimized for conversational tasks, leveraging a DPO dataset for improved response quality and alignment.

Loading preview...

Model Overview

This model, kedumerikugame/dpo-qwen-cot-merged, is a 4 billion parameter language model derived from Qwen/Qwen3-4B-Instruct-2507. It has been fine-tuned using Direct Preference Optimization (DPO), a method for aligning language models with human preferences, implemented through the Unsloth library. The repository provides the full, merged 16-bit weights, eliminating the need for separate adapter loading.

Training Details

The model underwent a single epoch of DPO training with a learning rate of 1e-07 and a beta value of 0.1. It was configured with a maximum sequence length of 1024 tokens. The LoRA configuration used for fine-tuning involved parameters r=8 and alpha=16, which have been merged into the base model.

Key Capabilities

  • Preference Alignment: Enhanced response quality and alignment with desired outputs through DPO training.
  • Efficient Deployment: Full-merged weights simplify deployment without requiring adapter management.
  • Conversational AI: Suitable for chat-based applications and instruction following, building on the capabilities of its Qwen3-Instruct base.

Use Cases

This model is particularly well-suited for applications requiring:

  • Generating high-quality, preference-aligned text in conversational contexts.
  • Instruction-following tasks where nuanced responses are beneficial.
  • Scenarios where a compact, fully merged model is preferred for ease of use and inference.