Name: W-61/qwen3-8b-base-new-dpo-ultrafeedback-4xh200-batch-128-q_t-0.4-s_star-0.35-20260430-140517 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: W-61

Model Overview

This model, developed by W-61, is an 8 billion parameter Qwen3-based language model. It is a fine-tuned iteration of jackf857/qwen3-8b-base-sft-ultrachat-4xh200-batch-128, specifically enhanced through Direct Preference Optimization (DPO).

Key Capabilities

Preference Alignment: Optimized using the HuggingFaceH4/ultrafeedback_binarized dataset, indicating a focus on aligning model outputs with human preferences.
DPO Fine-tuning: Leverages Direct Preference Optimization for improved response quality and reduced undesirable outputs.

Training Details

The model was trained for 1 epoch with a learning rate of 5e-07 and a total batch size of 128 across 4 GPUs. The training process utilized an AdamW optimizer and a cosine learning rate scheduler with a 0.1 warmup ratio. Evaluation metrics show a final loss of 0.6076, with specific DPO-related metrics like a mean margin of 54.4214 and a chosen log-probability of -331.5330, suggesting effective preference learning.

Good For

Applications requiring models with improved alignment to human feedback.
Tasks where response quality and preference adherence are critical.

Overview

Model Overview

Key Capabilities

Training Details

Good For

Full Model Card (README)