takeshi200ok/qwen3-4B-dpo-anti-fence-240slow26

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:4BQuant:BF16Ctx Length:32kPublished:Feb 26, 2026License:apache-2.0Architecture:Transformer Open Weights Warm

The takeshi200ok/qwen3-4B-dpo-anti-fence-240slow26 is a 4 billion parameter language model, fine-tuned from Qwen/Qwen3-4B-Instruct-2507 using Direct Preference Optimization (DPO). Developed by takeshi200ok, this model is specifically optimized to improve reasoning capabilities through Chain-of-Thought and enhance structured response quality. It is designed for applications requiring aligned and coherent text generation based on preferred outputs.

Loading preview...

Model Overview

This model, takeshi200ok/qwen3-4B-dpo-anti-fence-240slow26, is a 4 billion parameter language model developed by takeshi200ok. It is a fine-tuned version of the Qwen/Qwen3-4B-Instruct-2507 base model, utilizing Direct Preference Optimization (DPO) via the Unsloth library.

Key Characteristics

  • Optimization Objective: The primary goal of this DPO training was to align the model's responses with preferred outputs, specifically focusing on enhancing:
    • Reasoning (Chain-of-Thought): Improving the model's ability to generate logical, step-by-step thought processes.
    • Structured Response Quality: Producing more coherent and well-organized outputs based on a preference dataset.
  • Training Process: The DPO training was initiated from an existing SFT (Supervised Fine-Tuning) LoRA adapter and resulted in a fully merged 16-bit model, combining the base, SFT, and DPO weights. This means no adapter loading is required for usage.
  • Configuration: Training involved 1 epoch with a learning rate of 3e-06, a beta value of 0.05, and a maximum sequence length of 3072.

Usage and Integration

As a full-merged 16-bit model, it can be directly used with the transformers library for inference. The model's training data includes the [u-10bei/dpo-dataset-qwen-cot] dataset, and it operates under the MIT License, while also adhering to the original base model's license terms.