Name: takami2022/qwen3-4b-dpo-v2 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: takami2022

Model Overview

The takami2022/qwen3-4b-dpo-v2 is a 4 billion parameter language model, representing a refined version of the takami2022/qwen3-4b-dpo-v1 base model. This iteration has undergone additional training using Direct Preference Optimization (DPO) to further enhance its performance and alignment.

Key Training Details

Base Model: takami2022/qwen3-4b-dpo-v1
Optimization Method: Direct Preference Optimization (DPO)
Epochs: 1
Learning Rate: 1e-07
DPO Beta Value: 0.05 (adjusted from 0.1 in the previous version)
Maximum Sequence Length: 1024
LoRA Configuration: r=16, alpha=32 (merged into the base model)

What's New in v2?

The primary difference in this version is the adjustment of the DPO beta parameter from 0.1 to 0.05. This change in the beta value typically influences the strength of the preference optimization, aiming for a potentially more nuanced or robust alignment based on the preference data.

Good For

Applications requiring a 4B parameter model with enhanced alignment through DPO.
Tasks where fine-grained control over preference optimization is beneficial.
Further experimentation with DPO-tuned Qwen3-based models.

Overview

Model Overview

Key Training Details

What's New in v2?

Good For

Full Model Card (README)