mohtani777/Qwen3_4B_SFTV5_DPOv3_agent_v0_LR1E6

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:4BQuant:BF16Ctx Length:32kTool Calling:SupportedPublished:Mar 1, 2026License:apache-2.0Architecture:Transformer Open Weights Warm

The mohtani777/Qwen3_4B_SFTV5_DPOv3_agent_v0_LR1E6 is a 4 billion parameter instruction-tuned model, fine-tuned from Qwen/Qwen3-4B-Instruct-2507 using Direct Preference Optimization (DPO). This model is specifically optimized to enhance reasoning capabilities through Chain-of-Thought and improve structured response quality. It is designed for use cases requiring aligned and high-quality outputs, particularly in reasoning tasks.

Loading preview...

Overview

This model, mohtani777/Qwen3_4B_SFTV5_DPOv3_agent_v0_LR1E6, is a 4 billion parameter language model derived from Qwen/Qwen3-4B-Instruct-2507. It has undergone Direct Preference Optimization (DPO) using the Unsloth library, resulting in a full-merged 16-bit weight model that requires no adapter loading.

Key Optimizations

The primary objective of its DPO training was to align the model's responses with preferred outputs, focusing on two critical areas:

  • Enhanced Reasoning: Improved Chain-of-Thought capabilities.
  • Structured Response Quality: Better generation of structured outputs based on a preference dataset.

Training Details

  • Base Model: Qwen/Qwen3-4B-Instruct-2507
  • Methodology: Direct Preference Optimization (DPO)
  • Epochs: 5
  • Learning Rate: 1e-06
  • Max Sequence Length: 1024
  • Training Data: Utilized the u-10bei/dpo-dataset-qwen-cot dataset.

Usage Considerations

As a merged model, it can be directly integrated and used with the transformers library. Users should be aware that the model's license follows the MIT License, as per the dataset terms, and compliance with the original base model's license is also required.

Ideal Use Cases

This model is particularly well-suited for applications where:

  • High-quality, aligned responses are crucial.
  • Complex reasoning and Chain-of-Thought capabilities are needed.
  • Structured output generation is a priority.