Name: W-61/qwen3-8b-base-ipo-ultrafeedback-4xh200-batch-128-20260422-131855 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: W-61

Model Overview

This model, qwen3-8b-base-ipo-ultrafeedback-4xh200-batch-128-20260422-131855, is an 8 billion parameter language model developed by W-61. It is a fine-tuned version of the W-61/qwen3-8b-base-sft-ultrachat-4xh200-batch-128 base model, specifically optimized using the HuggingFaceH4/ultrafeedback_binarized dataset.

Key Capabilities

Preference Alignment: Fine-tuned with the Ultrafeedback dataset, indicating an emphasis on aligning model responses with human preferences.
Improved Response Quality: The training process, likely involving Identity Preference Optimization (IPO), aims to enhance the overall quality and helpfulness of generated text.
Base Model Foundation: Builds upon the capabilities of the Qwen3-8B architecture, suggesting strong general language understanding and generation.

Training Details

The model underwent a single epoch of training with a learning rate of 5e-07, utilizing a distributed setup across 4 GPUs. Key metrics from the evaluation set include a rewards accuracy of 0.6940 and a chosen log-probability of -1.7890, indicating its performance in distinguishing preferred responses. The training involved a total batch size of 128, using the AdamW optimizer with a cosine learning rate scheduler.

Overview

Model Overview

Key Capabilities

Training Details

Full Model Card (README)