sfutenma/dpo-qwen3_4b-cot-merged_v260227-161515

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:4BQuant:BF16Ctx Length:32kPublished:Feb 27, 2026License:apache-2.0Architecture:Transformer Open Weights Warm

The sfutenma/dpo-qwen3_4b-cot-merged_v260227-161515 model is a 4 billion parameter Qwen3-based causal language model, fine-tuned by sfutenma using Direct Preference Optimization (DPO) with a 32K context length. This model is specifically optimized for improving reasoning capabilities through Chain-of-Thought (CoT) and generating high-quality structured responses. It is designed for applications requiring enhanced logical deduction and precise output formatting.

Loading preview...

Model Overview

This model, sfutenma/dpo-qwen3_4b-cot-merged_v260227-161515, is a 4 billion parameter Qwen3-based language model developed by sfutenma. It has been fine-tuned using Direct Preference Optimization (DPO) via the Unsloth library, building upon the base model sfutenma/lora_structeval_t_qwen3_4b_v260221-161528. The fine-tuning process focused on aligning the model's responses with preferred outputs, specifically targeting improvements in reasoning (Chain-of-Thought) and the quality of structured responses.

Key Capabilities

  • Enhanced Reasoning: Optimized for Chain-of-Thought (CoT) prompting to improve logical deduction and problem-solving.
  • Structured Response Generation: Designed to produce high-quality, well-formatted structured outputs based on preference data.
  • DPO Fine-tuning: Leverages Direct Preference Optimization for better alignment with desired response characteristics.
  • Merged Weights: Provides full-merged 16-bit weights, eliminating the need for adapter loading and simplifying deployment.

Training Details

The model was trained for 5 epochs with a learning rate of 2e-07 and a beta value of 0.03, using a maximum sequence length of 768 tokens. The training data utilized was u-10bei/dpo-dataset-qwen-cot. The base model's LoRA configuration (r=8, alpha=16) was merged during the process.

Ideal Use Cases

This model is particularly well-suited for applications where precise reasoning, logical coherence, and structured output are critical. Developers can integrate it directly using the transformers library for tasks requiring advanced conversational AI with a focus on structured and reasoned responses.