Name: KawausoHiroKawauso/qwen3-4b-structeval-lora-39 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: KawausoHiroKawauso

Overview

This model, qwen3-4b-structeval-lora-39, is a 4 billion parameter language model developed by KawausoHiroKawauso. It is a fine-tuned version of the Qwen/Qwen3-4B-Instruct-2507 base model, utilizing Direct Preference Optimization (DPO) through the Unsloth library. The fine-tuning process aimed to align the model's responses with preferred outputs, specifically targeting improvements in reasoning (Chain-of-Thought) and the quality of structured responses based on a provided preference dataset.

Key Features and Optimization

Base Model: Qwen/Qwen3-4B-Instruct-2507.
Optimization Method: Direct Preference Optimization (DPO) for aligning responses.
Focus: Enhanced reasoning (Chain-of-Thought) and improved structured response generation.
Training Configuration: Trained for 1 epoch with a learning rate of 1e-05, beta of 0.4, and a maximum sequence length of 2048. LoRA configuration (r=8, alpha=16) was used and then merged into the base model.
Deployment: This repository contains the full-merged 16-bit weights, meaning no adapter loading is required for direct use with the transformers library.

Usage Considerations

This model is suitable for tasks where generating well-reasoned and structured outputs is critical. Users should be aware that the model's license follows the MIT License, as per the training data terms, and compliance with the original base model's license terms is also required.

Overview

Overview

Key Features and Optimization

Usage Considerations

Full Model Card (README)