sfutenma/dpo-qwen3_4b-cot-merged_v260302-093614

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:4BQuant:BF16Ctx Length:32kPublished:Mar 2, 2026License:apache-2.0Architecture:Transformer Open Weights Warm

The sfutenma/dpo-qwen3_4b-cot-merged_v260302-093614 is a 4 billion parameter Qwen3-based language model, fine-tuned using Direct Preference Optimization (DPO) for enhanced reasoning (Chain-of-Thought) and structured response quality. It features a 32768 token context length and is optimized to align responses with preferred outputs. This model is suitable for applications requiring improved logical coherence and structured text generation.

Loading preview...

Model Overview

The sfutenma/dpo-qwen3_4b-cot-merged_v260302-093614 is a 4 billion parameter language model based on the Qwen3 architecture. It has been fine-tuned using Direct Preference Optimization (DPO) via the Unsloth library, building upon the sfutenma/lora_structeval_t_qwen3_4b_v260228-172650 model. This release provides the full-merged 16-bit weights, eliminating the need for adapter loading.

Key Capabilities

  • Enhanced Reasoning: Optimized through DPO to improve Chain-of-Thought (CoT) reasoning abilities.
  • Structured Response Quality: Specifically aligned to produce higher quality, structured outputs based on a preference dataset.
  • Efficient Deployment: Provided as a fully merged model, ready for direct use with transformers without additional configuration.

Training Details

The model was trained for 5 epochs with a learning rate of 1e-06 and a beta of 0.1. It utilized a maximum sequence length of 768 tokens during DPO training. The base model for this fine-tuning was unsloth/Qwen3-4B-Instruct-2507. The training data used was u-10bei/dpo-dataset-qwen-cot.

Usage Considerations

This model is ideal for tasks where improved reasoning and structured, aligned responses are critical. Users should be aware that the model's license follows the MIT License, as per the dataset terms, and compliance with the original base model's license terms is also required.