KawausoHiroKawauso/qwen3-4b-structeval-lora-39

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:4BQuant:BF16Ctx Length:32kPublished:Feb 8, 2026License:apache-2.0Architecture:Transformer Open Weights Warm

KawausoHiroKawauso/qwen3-4b-structeval-lora-39 is a 4 billion parameter instruction-tuned model, fine-tuned from Qwen/Qwen3-4B-Instruct-2507 using Direct Preference Optimization (DPO) via Unsloth. This model is specifically optimized to enhance reasoning capabilities through Chain-of-Thought and improve the quality of structured responses. It is designed for applications requiring aligned outputs based on preferred datasets, offering improved performance in generating coherent and structured text.

Loading preview...

Overview

This model, qwen3-4b-structeval-lora-39, is a 4 billion parameter language model developed by KawausoHiroKawauso. It is a fine-tuned version of the Qwen/Qwen3-4B-Instruct-2507 base model, utilizing Direct Preference Optimization (DPO) through the Unsloth library. The fine-tuning process aimed to align the model's responses with preferred outputs, specifically targeting improvements in reasoning (Chain-of-Thought) and the quality of structured responses based on a provided preference dataset.

Key Features and Optimization

  • Base Model: Qwen/Qwen3-4B-Instruct-2507.
  • Optimization Method: Direct Preference Optimization (DPO) for aligning responses.
  • Focus: Enhanced reasoning (Chain-of-Thought) and improved structured response generation.
  • Training Configuration: Trained for 1 epoch with a learning rate of 1e-05, beta of 0.4, and a maximum sequence length of 2048. LoRA configuration (r=8, alpha=16) was used and then merged into the base model.
  • Deployment: This repository contains the full-merged 16-bit weights, meaning no adapter loading is required for direct use with the transformers library.

Usage Considerations

This model is suitable for tasks where generating well-reasoned and structured outputs is critical. Users should be aware that the model's license follows the MIT License, as per the training data terms, and compliance with the original base model's license terms is also required.