Name: mohtani777/Qwen3_4B_SFT_DPOv3_agent_v0_LR5E7 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: mohtani777

Model Overview

This model, mohtani777/Qwen3_4B_SFT_DPOv3_agent_v0_LR5E7, is a 4 billion parameter language model derived from Qwen/Qwen3-4B-Instruct-2507. It has undergone Direct Preference Optimization (DPO) using the Unsloth library, specifically targeting improved response alignment and quality.

Key Capabilities

Enhanced Reasoning: Optimized to improve Chain-of-Thought reasoning processes.
Structured Responses: Focuses on generating higher quality and more structured outputs.
DPO Fine-tuning: Utilizes DPO with a beta of 0.05 over 5 epochs and a learning rate of 5e-07 to align with preferred outputs.
Merged Weights: Provides full-merged 16-bit weights, eliminating the need for adapter loading.

Training Details

The model was trained with a maximum sequence length of 1024 and incorporated LoRA configuration (r=8, alpha=16) which has been merged into the base model. The training data used for DPO was sourced from [u-10bei/dpo-dataset-qwen-cot].

Usage Considerations

This model is ready for direct use with the transformers library. Users should be aware that the model's license is MIT (as per the dataset terms), and compliance with the original base model's license terms is also required.

Overview

Model Overview

Key Capabilities

Training Details

Usage Considerations

Full Model Card (README)