Name: W-61/qwen3-8b-base-beta-dpo-ultrafeedback-4xh200-batch-128-20260423-040315 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: W-61

Model Overview

This model, developed by W-61, is an 8 billion parameter language model fine-tuned from a Qwen3-8B base model. It leverages Direct Preference Optimization (DPO) on the HuggingFaceH4/ultrafeedback_binarized dataset, aiming to improve its alignment with human preferences and generate more desirable responses. The model supports a substantial context length of 32,768 tokens.

Key Characteristics

Base Model: Fine-tuned from W-61/qwen3-8b-base-sft-ultrachat-4xh200-batch-128.
Optimization Method: Utilizes Direct Preference Optimization (DPO) for enhanced response quality and alignment.
Training Data: Fine-tuned on the HuggingFaceH4/ultrafeedback_binarized dataset.
Context Length: Features a 32K token context window, suitable for processing longer inputs and generating coherent extended outputs.

Training Details

The model underwent a single epoch of training with a learning rate of 5e-07, a total batch size of 128, and a cosine learning rate scheduler. Evaluation metrics show a final loss of 0.5897 and a Beta Dpo/gap Mean of 23.0136, indicating effective preference learning.

Intended Uses

This model is suitable for applications where generating responses that align with human preferences is critical, such as advanced chatbots, content generation, and interactive AI systems that benefit from preference-tuned outputs.

Overview

Model Overview

Key Characteristics

Training Details

Intended Uses

Full Model Card (README)