Name: mohtani777/Qwen3_4B_SFT_DPOv3_agent_v0_LR1E7 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: mohtani777

Overview

This model, mohtani777/Qwen3_4B_SFT_DPOv3_agent_v0_LR1E7, is a 4 billion parameter language model built upon the Qwen/Qwen3-4B-Instruct-2507 base. It has been fine-tuned by mohtani777 using Direct Preference Optimization (DPO) via the Unsloth library. The fine-tuning process involved 5 epochs with a learning rate of 1e-07 and a beta of 0.05, targeting improved response alignment with preferred outputs.

Key Capabilities

Enhanced Reasoning: Optimized to improve Chain-of-Thought reasoning abilities.
Structured Responses: Focuses on generating higher quality, structured outputs.
DPO Fine-tuning: Leverages Direct Preference Optimization for better alignment with human preferences.
Full-Merged Weights: Provides full-merged 16-bit weights, eliminating the need for adapter loading.

Good For

Applications requiring models with refined reasoning skills.
Use cases where structured and high-quality conversational responses are critical.
Developers looking for a Qwen3-based model with DPO-enhanced performance in agentic or instructional contexts.

Technical Details

Base Model: Qwen/Qwen3-4B-Instruct-2507
Optimization Method: DPO
Max Sequence Length: 1024
License: MIT License (derived from the dataset terms), with compliance to the original base model's license terms.

Overview

Overview

Key Capabilities

Good For

Technical Details

Full Model Card (README)