Name: taketakedaiki/qwen3-4b-v2-exp26-dpo API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: taketakedaiki

Model Overview

The taketakedaiki/qwen3-4b-v2-exp26-dpo is a 4 billion parameter language model developed by taketakedaiki. It is a DPO (Direct Preference Optimization) fine-tuned variant, building upon the previously supervised fine-tuned (SFT) taketakedaiki/qwen3-4b-v2-exp25 base model. This model is designed to align its outputs more closely with human preferences through its DPO training.

Key Characteristics

Base Model: Fine-tuned from taketakedaiki/qwen3-4b-v2-exp25 (Exp25 SFT).
Fine-tuning Method: Utilizes Direct Preference Optimization (DPO) for alignment.
Training Parameters: The DPO process involved a learning rate of 1e-7, a beta value of 0.1, and was conducted for 1 epoch.
LoRA Configuration: Employs Low-Rank Adaptation (LoRA) with r=8 and alpha=16 for efficient parameter-efficient fine-tuning.
Context Length: Supports a substantial context window of 32768 tokens.

Potential Use Cases

This model is suitable for applications where preference-aligned responses are crucial, leveraging the DPO fine-tuning to generate outputs that are preferred over those from a purely supervised fine-tuned model. It can be considered for tasks requiring nuanced understanding and generation based on implicit or explicit preference data.

Overview

Model Overview

Key Characteristics

Potential Use Cases

Full Model Card (README)