Name: W-61/qwen3-8b-base-new-dpo-ultrafeedback-4xh200-batch-128-q_t-0.45-s_star-0.6-20260430-165125 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: W-61

Model Overview

This model, W-61/qwen3-8b-base-new-dpo-ultrafeedback-4xh200-batch-128-q_t-0.45-s_star-0.6-20260430-165125, is an 8 billion parameter language model. It is a fine-tuned iteration of jackf857/qwen3-8b-base-sft-ultrachat-4xh200-batch-128, specifically optimized using Direct Preference Optimization (DPO).

Key Capabilities

Preference Alignment: Enhanced through DPO training on the HuggingFaceH4/ultrafeedback_binarized dataset, suggesting improved alignment with human preferences and instruction following.
Base Architecture: Built upon a Qwen3-8B base, providing a robust foundation for various natural language processing tasks.
Context Length: Supports a substantial context window of 32768 tokens, enabling processing of longer inputs and generating more coherent, extended responses.

Training Details

The model underwent a single epoch of training with a learning rate of 5e-07, utilizing a total batch size of 128 across 4 GPUs. The optimizer used was ADAMW_TORCH with a cosine learning rate scheduler and a 0.1 warmup ratio. This training regimen aims to refine the model's conversational abilities and response quality.

Good for

Conversational AI: Its DPO fine-tuning makes it suitable for chatbots and interactive agents that require nuanced, human-aligned responses.
Instruction Following: Expected to perform well in tasks where precise adherence to user instructions is crucial.
Applications requiring longer context: The 32K context window is beneficial for summarizing long documents, extended dialogue, or complex reasoning over large texts.

Overview

Model Overview

Key Capabilities

Training Details

Good for

Full Model Card (README)