KawausoHiroKawauso/qwen3-4b-structeval-lora-36

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:4BQuant:BF16Ctx Length:32kPublished:Feb 8, 2026License:apache-2.0Architecture:Transformer Open Weights Warm

KawausoHiroKawauso/qwen3-4b-structeval-lora-36 is a 4 billion parameter Qwen3-Instruct model, fine-tuned using Direct Preference Optimization (DPO) via Unsloth. This model is specifically optimized to improve reasoning capabilities, particularly Chain-of-Thought, and enhance the quality of structured responses. It excels at generating aligned outputs based on preferred datasets, making it suitable for tasks requiring precise and structured language generation.

Loading preview...

Model Overview

This model, qwen3-4b-structeval-lora-36, is a 4 billion parameter variant of the Qwen3-Instruct architecture, specifically Qwen/Qwen3-4B-Instruct-2507. It has been fine-tuned using Direct Preference Optimization (DPO) via the Unsloth library, with its LoRA configuration (r=8, alpha=16) fully merged into the base model's 16-bit weights.

Key Capabilities

  • Enhanced Reasoning: Optimized to improve Chain-of-Thought reasoning, leading to more logical and coherent outputs.
  • Structured Response Quality: Fine-tuned to produce higher quality structured responses, aligning with preferred output formats.
  • DPO Alignment: Leverages DPO to align model responses with specific desired outputs based on a preference dataset.

Training Details

The model underwent 1 epoch of DPO training with a learning rate of 1e-05 and a beta value of 0.4. The maximum sequence length used during training was 1024 tokens. The training data utilized is u-10bei/dpo-dataset-qwen-cot.

Good For

  • Applications requiring improved reasoning and structured output generation.
  • Tasks where response alignment to specific preferences is crucial.
  • Developers looking for a Qwen3-based model with enhanced instruction following and structured output capabilities.