Name: mohtani777/Qwen3_4B_SFT_DPOv1_DPOv3_agent_v0 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: mohtani777

Overview

This model, mohtani777/Qwen3_4B_SFT_DPOv1_DPOv3_agent_v0, is a 4 billion parameter language model derived from the Qwen3-4B-Instruct-2507 base model. It has undergone fine-tuning using Direct Preference Optimization (DPO) via the Unsloth library, resulting in a merged 16-bit weight model that requires no adapter loading.

Key Capabilities & Optimization

Enhanced Reasoning: Optimized to improve Chain-of-Thought reasoning, enabling more structured and logical problem-solving.
Improved Response Quality: Fine-tuned to align responses with preferred outputs, leading to higher quality and more relevant generated text.
DPO Training: Utilizes DPO with a specific configuration (5 epochs, 5e-07 learning rate, beta 0.05, max sequence length 1024) to achieve its specialized performance.

Usage & Licensing

This model can be directly used with the transformers library for inference. It is licensed under the MIT License, consistent with its training data source, and users must also adhere to the original base model's license terms.

Overview

Overview

Key Capabilities & Optimization

Usage & Licensing

Full Model Card (README)