Name: W-61/qwen3-8b-base-new-dpo-ultrafeedback-4xh200-batch-128-q_t-0.43-s_star-0.4-20260429-230725 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: W-61

Model Overview

This model, W-61/qwen3-8b-base-new-dpo-ultrafeedback-4xh200-batch-128-q_t-0.43-s_star-0.4-20260429-230725, is an 8 billion parameter language model. It is a fine-tuned variant of jackf857/qwen3-8b-base-sft-ultrachat-4xh200-batch-128, specifically optimized using the Direct Preference Optimization (DPO) method.

Training Details

The model was trained on the HuggingFaceH4/ultrafeedback_binarized dataset. Key training hyperparameters included a learning rate of 5e-07, a train_batch_size of 4, and a gradient_accumulation_steps of 8, resulting in a total_train_batch_size of 128. The training utilized a cosine learning rate scheduler with a 0.1 warmup ratio over 1 epoch. Evaluation metrics show a validation loss of 0.5897 and a DPO margin mean of 51.0513, indicating effective preference alignment.

Key Characteristics

Parameter Count: 8 billion parameters.
Context Length: Supports a context window of 32768 tokens.
Optimization Method: Fine-tuned using Direct Preference Optimization (DPO) for enhanced alignment with human feedback.

Potential Use Cases

This model is well-suited for applications where generating responses that align with human preferences is crucial. Its DPO fine-tuning suggests improved conversational quality and adherence to desired output styles compared to models trained solely with Supervised Fine-Tuning (SFT).

Overview

Model Overview

Training Details

Key Characteristics

Potential Use Cases

Full Model Card (README)