Name: mohtani777/Qwen3_4B_SFTV5_DPOv3_agent_v0_LR1E6 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: mohtani777

Overview

This model, mohtani777/Qwen3_4B_SFTV5_DPOv3_agent_v0_LR1E6, is a 4 billion parameter language model derived from Qwen/Qwen3-4B-Instruct-2507. It has undergone Direct Preference Optimization (DPO) using the Unsloth library, resulting in a full-merged 16-bit weight model that requires no adapter loading.

Key Optimizations

The primary objective of its DPO training was to align the model's responses with preferred outputs, focusing on two critical areas:

Enhanced Reasoning: Improved Chain-of-Thought capabilities.
Structured Response Quality: Better generation of structured outputs based on a preference dataset.

Training Details

Base Model: Qwen/Qwen3-4B-Instruct-2507
Methodology: Direct Preference Optimization (DPO)
Epochs: 5
Learning Rate: 1e-06
Max Sequence Length: 1024
Training Data: Utilized the u-10bei/dpo-dataset-qwen-cot dataset.

Usage Considerations

As a merged model, it can be directly integrated and used with the transformers library. Users should be aware that the model's license follows the MIT License, as per the dataset terms, and compliance with the original base model's license is also required.

Ideal Use Cases

This model is particularly well-suited for applications where:

High-quality, aligned responses are crucial.
Complex reasoning and Chain-of-Thought capabilities are needed.
Structured output generation is a priority.

Overview

Overview

Key Optimizations

Training Details

Usage Considerations

Ideal Use Cases

Full Model Card (README)